Hi there,
With PHP, I am trying to open a series of files (msword, pdf, excel and txt) and search recursively through them returning string matches to a chosen search phrase (similar to how a spider searches sites). I have attempted to use the PHP function 'file_get_contents' but have had little luck reading anything but txt files. Any ideas?
Thanks,
Nathan
Search content in multiple file types
Moderator: General Moderators
-
sporkrunner
- Forum Newbie
- Posts: 1
- Joined: Wed Feb 13, 2008 5:25 pm
Re: Search content in multiple file types
you need to read up on file formats. whatever you use to read a file has to know how.
imagine reading an HTML page .... reading character by character of the actual file will start with "<html..." (for simplicity's sake) but what you probably want to find in the file is the result of it being parsed ... the "welcome to my website..." bit.
file_get_contents() only knows how to read plain text files, it does not know to ignore "<html..." and wait for "welcome to my .." for example.
as for actual parsers to read the file formats you mention i dunno. search. it'll be there.
imagine reading an HTML page .... reading character by character of the actual file will start with "<html..." (for simplicity's sake) but what you probably want to find in the file is the result of it being parsed ... the "welcome to my website..." bit.
file_get_contents() only knows how to read plain text files, it does not know to ignore "<html..." and wait for "welcome to my .." for example.
as for actual parsers to read the file formats you mention i dunno. search. it'll be there.