How extract source from MS Word file with php?

Ye' old general discussion board. Basically, for everything that isn't covered elsewhere. Come here to shoot the breeze, shoot your mouth off, or whatever suits your fancy.
This forum is not for asking programming related questions.

Moderator: General Moderators

Post Reply
ramis55
Forum Newbie
Posts: 3
Joined: Tue Dec 13, 2005 1:30 am
Location: Lithuania, Vilnius

How extract source from MS Word file with php?

Post by ramis55 »

Hello,
I want to extract the source (text, table and other info) from MS Word file. And this source show in HTML code. Are possible make this? If yes, then how can do it with php?

PS. php are installed on Unix OS. In this case I can’t to use COM().
User avatar
jayshields
DevNet Resident
Posts: 1912
Joined: Mon Aug 22, 2005 12:11 pm
Location: Leeds/Manchester, England

Post by jayshields »

Not possible. Is it absolutely necessary to use MS Word?

If not, Open Office (http://www.openoffice.org) is free and the files are XML encoded, which could be parsed by PHP.

Someone could do with backing me up on the above statements, I'm not 100% although I'm pretty sure :)
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

There are classes out there to work with MS Word files but I'm pretty sure they all use com(). Most of them are not free neither if that's any issue.

Most of the new word processors will allow you to save a file as .htm though. Just checked my Open Office on *nix certainly does. Open Office will open the MS Word files too....
Grim...
DevNet Resident
Posts: 1445
Joined: Tue May 18, 2004 5:32 am
Location: London, UK

Post by Grim... »

d11wtq wrote:Most of the new word processors will allow you to save a file as .htm though.
Like Word, for example.

But no, changing .doc files can't be done.
User avatar
jayshields
DevNet Resident
Posts: 1912
Joined: Mon Aug 22, 2005 12:11 pm
Location: Leeds/Manchester, England

Post by jayshields »

Never knew OOo could open/edit MS Word files... I thought MS files were encrypted so that only Microsoft programs could handle them?
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

jayshields wrote:Never knew OOo could open/edit MS Word files... I thought MS files were encrypted so that only Microsoft programs could handle them?
nope. Open Office has been using an import filter for opening them since first released.
Proprietary formats in this area will soon be a thing of the past as the Open Document Format will be adopted by all major software houses.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

You know... wouldn't it be possible to write a PHP extension for Unix systems that allows read/write of Microsoft Word documents through OOo...
d3ad1ysp0rk
Forum Donator
Posts: 1661
Joined: Mon Oct 20, 2003 8:31 pm
Location: Maine, USA

Post by d3ad1ysp0rk »

Yes, but unless they have an option to execute Open Office from the command line with a flag to save a converted file to XML, it will be VERY slow. It's a good idea though, let me know when you're done making it ;)
ramis55
Forum Newbie
Posts: 3
Joined: Tue Dec 13, 2005 1:30 am
Location: Lithuania, Vilnius

Post by ramis55 »

Hello to everyone and thanks for replies!
I found the tool which convert from DOC to HTML. Can download it from here:
http://fresh.t-systems-sfr.com/linux/src/
http://fresh.t-systems-sfr.com/linux/sr ... 0.3.tar.gz

The now I want convert *.xls (MS Excel) to *.html. Maybe who anyone to known which tool I must to use?
ramis55
Forum Newbie
Posts: 3
Joined: Tue Dec 13, 2005 1:30 am
Location: Lithuania, Vilnius

Post by ramis55 »

Post Reply