reading contents of PDF files
Moderator: General Moderators
reading contents of PDF files
I have a bunch of pdf files and would like to read their contents and store them in a db. Does anyone have any idea on how this can be achieved? (reading the PDF i mean 
- twigletmac
- Her Royal Site Adminness
- Posts: 5371
- Joined: Tue Apr 23, 2002 2:21 am
- Location: Essex, UK
- Leviathan
- Forum Commoner
- Posts: 36
- Joined: Tue Sep 23, 2003 7:00 pm
- Location: Waterloo, ON (Currently in Vancouver, BC)
You'd probably have to convert the PDF to text first. I doubt it's at all easy to write code that reads in a PDF file's format and parses it. If you can convert the PDF to text (or some equivalent format), I'd store the text in the database as well as a link to the PDF file, so you can search and then return the file(s) that match.