I don't know what lazy is but the one you just used is reluctant. The one I used is greedy. If you agree with this then you'll agree mine will deal with nested tables but yours won't.
Ollie Saunders wrote:I don't know what lazy is but the one you just used is reluctant. The one I used is greedy. If you agree with this then you'll agree mine will deal with nested tables but yours won't.
Yes, you are indeed correct. Yours works correctly for nested tables and mine fails. However, I was thinking of the case where the file has multiple tables in series, not nested, in which case mine works correctly and yours fails. Here is one that uses recursion and works for both cases:
<?php // File: NestedTables.php
$data = file_get_contents('NestedTablesTestData.html');
$pattern = '%
<table\b[^>]*+> # match opening TABLE tag
(?: # non-capture group for alternation
(?: # match chars inside a TABLE element
(?! # at a position that is not followed by
<table\b[^>]*+> # either an opening TABLE tag
| # or
</table> # a closing TABLE tag
). # match one char
)++ # until all chars within TABLE consumed
| # or...
(?R) # match a whole nested TABLE element
)*+ # as many as it takes until
</table> # balanced closing TABLE tag is matched
%six';
if (preg_match($pattern, $data, $matches) > 0) {
print_r($matches);
}
?>
Here is the "NestedTablesTestData.html" test file that works with the above script...
The regex in my previous post matches outermost TABLE elements each of which may contain nested TABLEs. The following regex matches innermost TABLE elements, which may NOT contain nested TABLEs.
<?php // File: NestedTablesInnermost.php
$data = file_get_contents('NestedTablesTestData.html');
// regex to match innermost TABLEs which may NOT contain nested TABLEs
$pattern_innermost = '%
<table\b[^>]*+> # match opening TABLE tag
(?: # match chars inside a TABLE element
(?! # at a position that is not followed by
<table\b[^>]*+> # either an opening TABLE tag
| # or
</table> # a closing TABLE tag
). # match one char
)*+ # until all chars within TABLE consumed
</table> # match closing TABLE tag
%six';
if (preg_match($pattern_innermost, $data, $matches) > 0) {
echo("Inner pattern matched. Here are the results:\r\n");
print_r($matches);
}
?>
Matching TABLEs that lie in-between these two extremes would not be a job for a regex.