PHP & MS Word

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
Technocrat
Forum Contributor
Posts: 127
Joined: Thu Oct 20, 2005 7:01 pm

PHP & MS Word

Post by Technocrat »

I have a site where it's possible for users upload Word docs so they can be viewed by other users. Last week my boss came to me with the request of have the ability to search those Word docs for keywords. After much searching here and in Google it appears that's not possible. Well I got told that a competing website has that ability (though of course they use ASP) . So my boss is breathing down my neck for a solution.

So is it possible to parse a word doc on a Linux server? I found solutions if your on a windows server using a COM object, but that's not going to work for us. I also found http://wvware.sourceforge.net/ but I am not sure if that does what I need. Has anyone used it before?

Is there another way of doing this that I am missing besides the obvious of having them copy and paste it into a textarea?

Thanks
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Indeed, wvware appears to convert microsoft document files a multitude of more readable formats. Give it a try and do let us know how it goes :)

But without a Windows box, implementing a Microsoft doc reader has always been a pain in the foot.
User avatar
Technocrat
Forum Contributor
Posts: 127
Joined: Thu Oct 20, 2005 7:01 pm

Post by Technocrat »

Just so I am clear because installing libraries isn't my thing, all I should need to do is compile this lib and the functions will then become available to me in PHP?
User avatar
Technocrat
Forum Contributor
Posts: 127
Joined: Thu Oct 20, 2005 7:01 pm

Post by Technocrat »

Figure I would post this for the next guy.

This library gives you the ability to change Microsoft Word version 3-2003 docs to HTML, text, and PDF (see http://wvware.sourceforge.net/#wv for function list). Text was the best option for me.

First you need to compile the library. I had my host do this for me to save time since I am not that well versed in linux. Though on my test box the library was in APT-GET which made it easy to install.

Here is the tricky part, your going to want to use the wv1.2.x library and not wv2 one. The functions are only in the 1. branch.

I then used shell_exec to create a text file

Code: Select all

@shell_exec('/usr/local/bin/wvText /path/name.doc /path/name.text');
Then use fopen or what ever to read it.

On my live box I had an issue with the paths. To fix this I ran

Code: Select all

@shell_exec('export PATH="/usr/local/bin/"; /usr/local/bin/wvText /path/name.doc /path/name.text');
If you are running this more than once you don't need the path part after the first instance.
Post Reply