Page 1 of 1

Excel/DOC/Powerpoint/PDF Parser

Posted: Wed Jul 09, 2003 8:39 pm
by feelspark
Hi Friends

I am working on search engine and in this application there is need to
extract the keywords from Excel/DOC/Powerpoint/PDF files that are
stored in UNIX server.

Could anyone suggest my the way to parse above application files with the help of PHP at UNIX SERVER.

It would be a great help.

regards

Deepak

Posted: Wed Oct 01, 2003 10:13 am
by thepez
If you found an answer to this please post it, I am in the same situation.

Posted: Wed Oct 01, 2003 10:02 pm
by feelspark
For this Java-poi is the only solution...

what we did to solve this problem, we use PHP script as front end
and use java - poi classes & oracle as our back end.

java- poi have classses for MS documents handling.

we extracted the documents through java ..and excess through php.

I think it would help you to find the way. :)

regards

Deepak
aggarwal_deep@yahoo.com

Posted: Thu Oct 02, 2003 3:15 am
by Wayne
There are also PERL modules available that will extract or convert the documents to html. Have a look on the swish-e.org website.

Posted: Fri Oct 24, 2003 9:28 am
by jsim
As for parsing Excel files using php on unix you can try this:

http://www.softclub.org/excelexplorer/

It's not free but you can try demo on-line.