Page 1 of 1

Search content in multiple file types

Posted: Wed Feb 13, 2008 5:31 pm
by sporkrunner
Hi there,
With PHP, I am trying to open a series of files (msword, pdf, excel and txt) and search recursively through them returning string matches to a chosen search phrase (similar to how a spider searches sites). I have attempted to use the PHP function 'file_get_contents' but have had little luck reading anything but txt files. Any ideas?
Thanks,
Nathan

Re: Search content in multiple file types

Posted: Thu Feb 14, 2008 10:33 am
by Popcorn
you need to read up on file formats. whatever you use to read a file has to know how.

imagine reading an HTML page .... reading character by character of the actual file will start with "<html..." (for simplicity's sake) but what you probably want to find in the file is the result of it being parsed ... the "welcome to my website..." bit.

file_get_contents() only knows how to read plain text files, it does not know to ignore "<html..." and wait for "welcome to my .." for example.

as for actual parsers to read the file formats you mention i dunno. search. it'll be there.