Can anyone help me out with this problem. I wish to index a .doc page for searching. e.g. I enter a keyword and it returns the page(s) that word appears on. I plan on having a simple MySQL table with the following fields:
**************************
page_number INT auto_increment
page_text TEXT
**************************
I have managed to figure out how to convert a .doc file to plain text using the msWord2Text() function shown below, so I am able to grab the plain text ready for insertion into my MySQL table, however the code returns the entire document as the $result string. I need a separate string for each page, or split the .doc into it's separate pages.
Code: Select all
<?php
function msWord2Text($userDoc) {
$iLineTeller = 0;
$sPreviousLine = "";
$line = file_get_contents($userDoc);
$lines = explode(chr(0x0D),$line);
$outtext = "";
foreach($lines as $thisline) {
$pos = strpos($thisline, chr(0x00));
$stringlengte = strlen($thisline);
if (($pos !== FALSE)||($stringlengte==0)) {
//print("$thisline\n");
}else{
//first line bug...
if($iLineTeller == 0){
$lastpos = strrpos($sPreviousLine, chr(0x00));
$sTekst = substr($sPreviousLine,$lastpos,strlen($sPreviousLine) - $lastpos);
$outtext .= $sTekst."\n";
}
$outtext .= $thisline."\n";
$iLineTeller++;
}
if($stringlengte != 0)
$sPreviousLine = $thisline;
}
$outtext = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\é\è\ç\ë\à\'\:\t@\/\_\(\)]/","",$outtext);
return $outtext;
}
$sourcefile = 'test.doc';
$result = msWord2Text($sourcefile);
echo $result;
?>Much Appreciated
Alk...