I have a site where it's possible for users upload Word docs so they can be viewed by other users. Last week my boss came to me with the request of have the ability to search those Word docs for keywords. After much searching here and in Google it appears that's not possible. Well I got told that a competing website has that ability (though of course they use ASP) . So my boss is breathing down my neck for a solution.
So is it possible to parse a word doc on a Linux server? I found solutions if your on a windows server using a COM object, but that's not going to work for us. I also found http://wvware.sourceforge.net/ but I am not sure if that does what I need. Has anyone used it before?
Is there another way of doing this that I am missing besides the obvious of having them copy and paste it into a textarea?
Thanks
PHP & MS Word
Moderator: General Moderators
- John Cartwright
- Site Admin
- Posts: 11470
- Joined: Tue Dec 23, 2003 2:10 am
- Location: Toronto
- Contact:
- Technocrat
- Forum Contributor
- Posts: 127
- Joined: Thu Oct 20, 2005 7:01 pm
- Technocrat
- Forum Contributor
- Posts: 127
- Joined: Thu Oct 20, 2005 7:01 pm
Figure I would post this for the next guy.
This library gives you the ability to change Microsoft Word version 3-2003 docs to HTML, text, and PDF (see http://wvware.sourceforge.net/#wv for function list). Text was the best option for me.
First you need to compile the library. I had my host do this for me to save time since I am not that well versed in linux. Though on my test box the library was in APT-GET which made it easy to install.
Here is the tricky part, your going to want to use the wv1.2.x library and not wv2 one. The functions are only in the 1. branch.
I then used shell_exec to create a text file
Then use fopen or what ever to read it.
On my live box I had an issue with the paths. To fix this I ran
If you are running this more than once you don't need the path part after the first instance.
This library gives you the ability to change Microsoft Word version 3-2003 docs to HTML, text, and PDF (see http://wvware.sourceforge.net/#wv for function list). Text was the best option for me.
First you need to compile the library. I had my host do this for me to save time since I am not that well versed in linux. Though on my test box the library was in APT-GET which made it easy to install.
Here is the tricky part, your going to want to use the wv1.2.x library and not wv2 one. The functions are only in the 1. branch.
I then used shell_exec to create a text file
Code: Select all
@shell_exec('/usr/local/bin/wvText /path/name.doc /path/name.text');On my live box I had an issue with the paths. To fix this I ran
Code: Select all
@shell_exec('export PATH="/usr/local/bin/"; /usr/local/bin/wvText /path/name.doc /path/name.text');