Trouble matching specific table rows....

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
Burrito
Spockulator
Posts: 4715
Joined: Wed Feb 04, 2004 8:15 pm
Location: Eden, Utah

Trouble matching specific table rows....

Post by Burrito »

I'm trying to match rows from an HTML table that contain a value in the fourth cell. I want to strip out the rest of the rows from the table.

EX:

Code: Select all

<tr bgcolor='white'>

			<TD class="GridAttrField">070905499</TD>

			<TD class="GridAttrField">DC</TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField">PLA</TD>
			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField"></TD>

			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField">UT</TD>
			<TD class="GridAttrField">blah</TD>
		</TR>
	
		<tr bgcolor='#dddddd'>
		
			<TD class="GridAttrField">070905499</TD>
			<TD class="GridAttrField">DC</TD>

			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField">DEF</TD>
			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField">UT</TD>

			<TD class="GridAttrField">84404</TD>
		</TR>
	
		<tr bgcolor='white'>

			<TD class="GridAttrField">070905499</TD>
			<TD class="GridAttrField">DC</TD>
			<TD class="GridAttrField">blah</TD>

			<TD class="GridAttrField">DEF</TD>
			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField">UT</TD>
			<TD class="GridAttrField">84404</TD>

		</TR>
	
	<tr bgcolor='white'>

			<TD class="GridAttrField">070905500</TD>
			<TD class="GridAttrField">DC</TD>
			<TD class="GridAttrField"> blah</TD>
			<TD class="GridAttrField">PLA</TD>

			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField">UT</TD>
			<TD class="GridAttrField">84401</TD>
		</TR>

	
		<tr bgcolor='#dddddd'>
		
			<TD class="GridAttrField">070905500</TD>
			<TD class="GridAttrField">DC</TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField">DEF</TD>
			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>

			<TD class="GridAttrField"></TD>
			<TD class="GridAttrField">blah</TD>
			<TD class="GridAttrField">UT</TD>
			<TD class="GridAttrField">84403-7017</TD>
		</TR>
if you look at that, you'll see that the fourth column either contains DEF or PLA. I only want the rows that have DEF in the fourth column. I am planning to strip them out from the table and then rebuild the table. I tried this:

Code: Select all

preg_match_all("#(<tr.*?>DEF</td>.*?</tr>)#mis",$string,$matches);
	echo "<pre>";
	print_r($matches[1]);
	echo "</pre>";
but that grabs all the rows....

any better ideas?

tia,

Burr
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

this is UGLY, but it works:

Code: Select all

$rows = preg_split('#</?tr.*?>#mis',$src);

foreach($rows as $row){
	if(preg_match('#(<td.*?>.*?</td>.*?<td.*?>.*?</td>.*?<td.*?>.*?</td>.*?<td.*?>DEF</td>.*)#mis',$row)){
		$matches[] = $row;
	}
}

//print_r($matches);
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Post by GeertDD »

This one is a bit nicer maybe. I don't like to use multiple instances of .*? though.

Code: Select all

~<tr\s.*?>(?:DEF|PLA)<.*?</tr>~is
However, note that the regex above will also return rows that contain DEF or PLA in another cell, not necessarily the fourth one. So it depends on your context whether it would be useful or not.
User avatar
stereofrog
Forum Contributor
Posts: 386
Joined: Mon Dec 04, 2006 6:10 am

Post by stereofrog »

What are you going to do with found rows? In general, you don't need regexp for this

Code: Select all

$doc = new DOMDocument();
$doc->loadHTML($html);
$xp = new DOMXpath($doc);
$rows = $xp->query("//tr[td='DEF']");
foreach($rows as $row)
	echo $row->firstChild->nodeValue, "\n";
Post Reply