Page 1 of 1
reading contents of PDF files
Posted: Fri Sep 26, 2003 7:51 am
by vangelis
I have a bunch of pdf files and would like to read their contents and store them in a db. Does anyone have any idea on how this can be achieved? (reading the PDF i mean

Posted: Fri Sep 26, 2003 8:58 am
by twigletmac
May be a better (easier to maintain) idea to just store the location of the PDF file in the database and to keep the actual file data as a PDF file.
Mac
Posted: Fri Sep 26, 2003 11:55 am
by vangelis
That sounds like a good idea, but i need to be able to search the text contained in the PDFs.
Maybe there could be a way around it, let's say convert the PDF to txt and and then store it. U think that could be possible?
Posted: Fri Sep 26, 2003 12:57 pm
by Leviathan
You'd probably have to convert the PDF to text first. I doubt it's at all easy to write code that reads in a PDF file's format and parses it. If you can convert the PDF to text (or some equivalent format), I'd store the text in the database as well as a link to the PDF file, so you can search and then return the file(s) that match.