Page 1 of 1
How extract source from MS Word file with php?
Posted: Tue Dec 13, 2005 1:40 am
by ramis55
Hello,
I want to extract the source (text, table and other info) from MS Word file. And this source show in HTML code. Are possible make this? If yes, then how can do it with php?
PS. php are installed on Unix OS. In this case I can’t to use COM().
Posted: Tue Dec 13, 2005 3:08 am
by jayshields
Not possible. Is it absolutely necessary to use MS Word?
If not, Open Office (
http://www.openoffice.org) is free and the files are XML encoded, which could be parsed by PHP.
Someone could do with backing me up on the above statements, I'm not 100% although I'm pretty sure

Posted: Tue Dec 13, 2005 3:40 am
by Chris Corbyn
There are classes out there to work with MS Word files but I'm pretty sure they all use com(). Most of them are not free neither if that's any issue.
Most of the new word processors will allow you to save a file as .htm though. Just checked my Open Office on *nix certainly does. Open Office will open the MS Word files too....
Posted: Tue Dec 13, 2005 4:30 am
by Grim...
d11wtq wrote:Most of the new word processors will allow you to save a file as .htm though.
Like Word, for example.
But no, changing .doc files can't be done.
Posted: Tue Dec 13, 2005 5:23 am
by jayshields
Never knew OOo could open/edit MS Word files... I thought MS files were encrypted so that only Microsoft programs could handle them?
Posted: Tue Dec 13, 2005 5:27 am
by patrikG
jayshields wrote:Never knew OOo could open/edit MS Word files... I thought MS files were encrypted so that only Microsoft programs could handle them?
nope. Open Office has been using an import filter for opening them since first released.
Proprietary formats in this area will soon be a thing of the past as the Open Document Format will be adopted by all major software houses.
Posted: Tue Dec 13, 2005 7:36 pm
by Ambush Commander
You know... wouldn't it be possible to write a PHP extension for Unix systems that allows read/write of Microsoft Word documents through OOo...
Posted: Tue Dec 13, 2005 7:39 pm
by d3ad1ysp0rk
Yes, but unless they have an option to execute Open Office from the command line with a flag to save a converted file to XML, it will be VERY slow. It's a good idea though, let me know when you're done making it

Posted: Fri Dec 16, 2005 6:31 am
by ramis55
Hello to everyone and thanks for replies!
I found the tool which convert from DOC to HTML. Can download it from here:
http://fresh.t-systems-sfr.com/linux/src/
http://fresh.t-systems-sfr.com/linux/sr ... 0.3.tar.gz
The now I want convert *.xls (MS Excel) to *.html. Maybe who anyone to known which tool I must to use?
Posted: Fri Dec 16, 2005 7:40 am
by ramis55