Page 1 of 1

Trouble matching specific table rows....

Posted: Thu Oct 04, 2007 12:56 pm
by Burrito
I'm trying to match rows from an HTML table that contain a value in the fourth cell. I want to strip out the rest of the rows from the table.

EX:

Code: Select all

<tr bgcolor='white'>

			<TD class="GridAttrField">070905499</TD>

			<TD class="GridAttrField">DC</TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField">PLA</TD>
			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField"></TD>

			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField">UT</TD>
			<TD class="GridAttrField">blah</TD>
		</TR>
	
		<tr bgcolor='#dddddd'>
		
			<TD class="GridAttrField">070905499</TD>
			<TD class="GridAttrField">DC</TD>

			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField">DEF</TD>
			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField">UT</TD>

			<TD class="GridAttrField">84404</TD>
		</TR>
	
		<tr bgcolor='white'>

			<TD class="GridAttrField">070905499</TD>
			<TD class="GridAttrField">DC</TD>
			<TD class="GridAttrField">blah</TD>

			<TD class="GridAttrField">DEF</TD>
			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField">UT</TD>
			<TD class="GridAttrField">84404</TD>

		</TR>
	
	<tr bgcolor='white'>

			<TD class="GridAttrField">070905500</TD>
			<TD class="GridAttrField">DC</TD>
			<TD class="GridAttrField"> blah</TD>
			<TD class="GridAttrField">PLA</TD>

			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField">UT</TD>
			<TD class="GridAttrField">84401</TD>
		</TR>

	
		<tr bgcolor='#dddddd'>
		
			<TD class="GridAttrField">070905500</TD>
			<TD class="GridAttrField">DC</TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField">DEF</TD>
			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>

			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField">UT</TD>
			<TD class="GridAttrField">84403-7017</TD>
		</TR>
if you look at that, you'll see that the fourth column either contains DEF or PLA. I only want the rows that have DEF in the fourth column. I am planning to strip them out from the table and then rebuild the table. I tried this:

Code: Select all

preg_match_all("#(<tr.*?>DEF</td>.*?</tr>)#mis",$string,$matches);
	echo "<pre>";
	print_r($matches[1]);
	echo "</pre>";
but that grabs all the rows....

any better ideas?

tia,

Burr

Posted: Thu Oct 04, 2007 2:15 pm
by Kieran Huggins
this is UGLY, but it works:

Code: Select all

$rows = preg_split('#</?tr.*?>#mis',$src);

foreach($rows as $row){
	if(preg_match('#(<td.*?>.*?</td>.*?<td.*?>.*?</td>.*?<td.*?>.*?</td>.*?<td.*?>DEF</td>.*)#mis',$row)){
		$matches[] = $row;
	}
}

//print_r($matches);

Posted: Thu Oct 04, 2007 2:58 pm
by GeertDD
This one is a bit nicer maybe. I don't like to use multiple instances of .*? though.

Code: Select all

~<tr\s.*?>(?:DEF|PLA)<.*?</tr>~is
However, note that the regex above will also return rows that contain DEF or PLA in another cell, not necessarily the fourth one. So it depends on your context whether it would be useful or not.

Posted: Fri Oct 05, 2007 3:26 am
by stereofrog
What are you going to do with found rows? In general, you don't need regexp for this

Code: Select all

$doc = new DOMDocument();
$doc->loadHTML($html);
$xp = new DOMXpath($doc);
$rows = $xp->query("//tr[td='DEF']");
foreach($rows as $row)
	echo $row->firstChild->nodeValue, "\n";