Page 1 of 1
Table regex.. ??
Posted: Thu Sep 03, 2009 9:13 am
by teapear
Hello guys..
i need your help to match table in html code below..
HTML code
Code: Select all
<p>Welcome</p>
<table cellspacing="0" cellpadding="3" bordercolor="ty" border="1" id="mytable1" width="100%">
<tr nowrap="nowrap" bgcolor="#ECECEC">
<td width="25%">sdsdsdsdsdsd</td>
</tr>
</table>
<div>this is a test</div>
output i need
Code: Select all
<table cellspacing="0" cellpadding="3" bordercolor="ty" border="1" id="mytable1" width="100%">
<tr nowrap="nowrap" bgcolor="#ECECEC">
<td width="25%">sdsdsdsdsdsd</td>
</tr>
</table>
Thanks...
Re: Table regex.. ??
Posted: Thu Sep 03, 2009 10:17 am
by Ollie Saunders
|<table.*</table>|is
Re: Table regex.. ??
Posted: Thu Sep 03, 2009 9:04 pm
by ridgerunner
Better yet, if your data has multiple tables, use the lazy star quantifier like so...
Re: Table regex.. ??
Posted: Thu Sep 03, 2009 10:48 pm
by Ollie Saunders
I don't know what lazy is but the one you just used is reluctant. The one I used is greedy. If you agree with this then you'll agree mine will deal with nested tables but yours won't.
Re: Table regex.. ??
Posted: Fri Sep 04, 2009 1:08 am
by ridgerunner
Ollie Saunders wrote:I don't know what lazy is but the one you just used is reluctant. The one I used is greedy. If you agree with this then you'll agree mine will deal with nested tables but yours won't.
Yes, you are indeed correct. Yours works correctly for nested tables and mine fails. However, I was thinking of the case where the file has multiple tables in series, not nested, in which case mine works correctly and yours fails. Here is one that uses recursion and works for both cases:
Code: Select all
<?php // File: NestedTables.php
$data = file_get_contents('NestedTablesTestData.html');
$pattern = '%
<table\b[^>]*+> # match opening TABLE tag
(?: # non-capture group for alternation
(?: # match chars inside a TABLE element
(?! # at a position that is not followed by
<table\b[^>]*+> # either an opening TABLE tag
| # or
</table> # a closing TABLE tag
). # match one char
)++ # until all chars within TABLE consumed
| # or...
(?R) # match a whole nested TABLE element
)*+ # as many as it takes until
</table> # balanced closing TABLE tag is matched
%six';
if (preg_match($pattern, $data, $matches) > 0) {
print_r($matches);
}
?>
Here is the "NestedTablesTestData.html" test file that works with the above script...
Code: Select all
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head><title>Test Nested Tables</title></head>
<body>
<table>
<tr><th>A1</th><th>B1</th></tr>
<tr><td>
<table>
<tr><th>A2</th><th>B2</th></tr>
<tr><td>
<table>
<tr><th>A3</th><th>B3-xxx</th></tr>
<tr><td>1</td><td>2</td></tr>
</table>
</td><td>
<table>
<tr><th>A3</th><th>B3</th></tr>
<tr><td>1</td><td>2</td></tr>
</table>
</td></tr>
</table>
</td><td>
<table>
<tr><th>A2</th><th>B2</th></tr>
<tr><td>
<table>
<tr><th>A3</th><th>B3</th></tr>
<tr><td>1</td><td>2</td></tr>
</table>
</td><td>
<table>
<tr><th>A3</th><th>B3</th></tr>
<tr><td>1</td><td>2</td></tr>
</table>
</td></tr>
</table>
</td></tr>
</table>
<p>Stuff between the two main tables</p>
<table>
<tr><th>A1</th><th>B1</th></tr>
<tr><td>
<table>
<tr><th>A2</th><th>B2</th></tr>
<tr><td>
<table>
<tr><th>A3</th><th>B3-xxx</th></tr>
<tr><td>1</td><td>2</td></tr>
</table>
</td><td>
<table>
<tr><th>A3</th><th>B3</th></tr>
<tr><td>1</td><td>2</td></tr>
</table>
</td></tr>
</table>
</td><td>
<table>
<tr><th>A2</th><th>B2</th></tr>
<tr><td>
<table>
<tr><th>A3</th><th>B3</th></tr>
<tr><td>1</td><td>2</td></tr>
</table>
</td><td>
<table>
<tr><th>A3</th><th>B3</th></tr>
<tr><td>1</td><td>2</td></tr>
</table>
</td></tr>
</table>
</td></tr>
</table>
</body>
</html>
Re: Table regex.. ??
Posted: Fri Sep 04, 2009 9:02 am
by Ollie Saunders
Nice. I think lazy is just another word for reluctant.
Re: Table regex.. ??
Posted: Fri Sep 04, 2009 9:26 am
by ridgerunner
The regex in my previous post matches
outermost TABLE elements each of which may contain nested TABLEs. The following regex matches
innermost TABLE elements, which may NOT contain nested TABLEs.
Code: Select all
<?php // File: NestedTablesInnermost.php
$data = file_get_contents('NestedTablesTestData.html');
// regex to match innermost TABLEs which may NOT contain nested TABLEs
$pattern_innermost = '%
<table\b[^>]*+> # match opening TABLE tag
(?: # match chars inside a TABLE element
(?! # at a position that is not followed by
<table\b[^>]*+> # either an opening TABLE tag
| # or
</table> # a closing TABLE tag
). # match one char
)*+ # until all chars within TABLE consumed
</table> # match closing TABLE tag
%six';
if (preg_match($pattern_innermost, $data, $matches) > 0) {
echo("Inner pattern matched. Here are the results:\r\n");
print_r($matches);
}
?>
Matching TABLEs that lie in-between these two extremes would not be a job for a regex.

Re: Table regex.. ??
Posted: Fri Sep 04, 2009 9:33 am
by ridgerunner
Ollie Saunders wrote:Nice. I think lazy is just another word for reluctant.
I got the term
lazy from Jeffrey Friedl's classic: "
Mastering Regular Expressions - 3rd Edition". (highly recommended).
Re: Table regex.. ??
Posted: Fri Sep 04, 2009 9:56 am
by Ollie Saunders
I really hope the thread author finds these useful.
Re: Table regex.. ??
Posted: Sat Sep 05, 2009 2:17 pm
by prometheuzz
Ollie Saunders wrote:Nice. I think lazy is just another word for reluctant.
That is correct.
Re: Table regex.. ??
Posted: Sat Sep 05, 2009 2:17 pm
by prometheuzz