Page 1 of 1

good tag reader gone bad

Posted: Sat Jan 06, 2007 1:37 pm
by mlecho
hi all-- i have been using this script (thanks to all of the help from these forums) to read b/w <b> tags in specific html files:

Code: Select all

<xml>
<?php
$menuType=$_GET['selDir'];
$selYear=$_GET['selYear'];
$fileName="../images/$selYear/".$menuType."/".$menuType.".html";
$handle=fopen($fileName,"r");
$contents=fread($handle,filesize($fileName));
$wanted=preg_match_all('/<b>(.*)<\/b>/', $contents, $matches,PREG_SET_ORDER);
fclose($handle);
$i=0;
foreach($matches as $value){
	echo ("$value[0]");
}
?>
</xml>
All was working fine, until recently. While tighting up the site directories, i placed all the site's existing directories into one called 2007. In the past i could retreive the script output as so:

Code: Select all

http://www.website.com/php/xml.php?selDir=VENICE&selYear=2005
the new mapping is:

Code: Select all

http://www.website.com/2007/php/xml.php?selDir=VENICE&selYear=2005
however, rather than a clean xml with only the text from the bold entries, the script is now reading tons of other tags after the first <b> tag

Code: Select all

<xml>
<b>Venetian Weekend</b>
    </td>
    <td align="center" width="33%">
     <a href="Venice-Pages/Image1.html"><img height="160" alt="Venetian Weekend" width="240" src="Venice-Thumbnails/1.jpg"></a>
     <br><b>Venetian Weekend</b>
    </td>
    <td align="center" width="33%">
     <a href="Venice-Pages/Image2.html"><img height="160" alt="Venetian Weekend" width="240" src="Venice-Thumbnails/2.jpg"></a>

     <br><b>Venetian Weekend</b>
    </td>
   </tr>

....etc
...what happend?

Posted: Sat Jan 06, 2007 1:58 pm
by feyd
You may want to use file_exists(), is_readable() and file_get_contents().

As for your actual problem, the only thing I can see is a greedy regular expression. ".*?" would not be greedy.

Posted: Sat Jan 06, 2007 4:42 pm
by Kieran Huggins
feyd means:

Code: Select all

$wanted=preg_match_all('/<b>(.*?)<\/b>/', $contents, $matches,PREG_SET_ORDER);

Posted: Sat Jan 06, 2007 7:53 pm
by mlecho
feyd rocks....the ? was the answer....sorry to be so new to php, but greedy means exactly what in this case? What is that ?-mark telling php? i read a lot at regular-expressions.info, and i got this about "?"- "Makes the preceding item optional. Greedy, so the optional item is included in the match if possible."

Posted: Sat Jan 06, 2007 8:25 pm
by feyd
In this case it will attempt to match the maximum amount of data possible. Ungreedy will match the shortest possible.