Excel/DOC/Powerpoint/PDF Parser

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
feelspark
Forum Newbie
Posts: 2
Joined: Wed Jul 09, 2003 8:39 pm
Location: s.korea
Contact:

Excel/DOC/Powerpoint/PDF Parser

Post by feelspark »

Hi Friends

I am working on search engine and in this application there is need to
extract the keywords from Excel/DOC/Powerpoint/PDF files that are
stored in UNIX server.

Could anyone suggest my the way to parse above application files with the help of PHP at UNIX SERVER.

It would be a great help.

regards

Deepak
thepez
Forum Newbie
Posts: 2
Joined: Wed Oct 01, 2003 9:25 am

Post by thepez »

If you found an answer to this please post it, I am in the same situation.
feelspark
Forum Newbie
Posts: 2
Joined: Wed Jul 09, 2003 8:39 pm
Location: s.korea
Contact:

Post by feelspark »

For this Java-poi is the only solution...

what we did to solve this problem, we use PHP script as front end
and use java - poi classes & oracle as our back end.

java- poi have classses for MS documents handling.

we extracted the documents through java ..and excess through php.

I think it would help you to find the way. :)

regards

Deepak
aggarwal_deep@yahoo.com
User avatar
Wayne
Forum Contributor
Posts: 339
Joined: Wed Jun 05, 2002 10:59 am

Post by Wayne »

There are also PERL modules available that will extract or convert the documents to html. Have a look on the swish-e.org website.
jsim
Forum Newbie
Posts: 2
Joined: Fri Oct 24, 2003 9:24 am

Post by jsim »

As for parsing Excel files using php on unix you can try this:

http://www.softclub.org/excelexplorer/

It's not free but you can try demo on-line.
Post Reply