Table regex.. ??

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
teapear
Forum Newbie
Posts: 2
Joined: Wed May 20, 2009 8:22 pm

Table regex.. ??

Post by teapear »

Hello guys..

i need your help to match table in html code below..

HTML code

Code: Select all

 
<p>Welcome</p>
<table cellspacing="0" cellpadding="3" bordercolor="ty" border="1" id="mytable1" width="100%">
    <tr nowrap="nowrap" bgcolor="#ECECEC">
        <td width="25%">sdsdsdsdsdsd</td>
 
    </tr>
</table>
<div>this is a test</div>
 


output i need

Code: Select all

<table cellspacing="0" cellpadding="3" bordercolor="ty" border="1" id="mytable1" width="100%">
    <tr nowrap="nowrap" bgcolor="#ECECEC">
        <td width="25%">sdsdsdsdsdsd</td>
 
    </tr>
</table>
 
Thanks...
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Re: Table regex.. ??

Post by Ollie Saunders »

|<table.*</table>|is
User avatar
ridgerunner
Forum Contributor
Posts: 214
Joined: Sun Jul 05, 2009 10:39 pm
Location: SLC, UT

Re: Table regex.. ??

Post by ridgerunner »

Better yet, if your data has multiple tables, use the lazy star quantifier like so...

Code: Select all

<table.*?</table>
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Re: Table regex.. ??

Post by Ollie Saunders »

I don't know what lazy is but the one you just used is reluctant. The one I used is greedy. If you agree with this then you'll agree mine will deal with nested tables but yours won't.
User avatar
ridgerunner
Forum Contributor
Posts: 214
Joined: Sun Jul 05, 2009 10:39 pm
Location: SLC, UT

Re: Table regex.. ??

Post by ridgerunner »

Ollie Saunders wrote:I don't know what lazy is but the one you just used is reluctant. The one I used is greedy. If you agree with this then you'll agree mine will deal with nested tables but yours won't.
Yes, you are indeed correct. Yours works correctly for nested tables and mine fails. However, I was thinking of the case where the file has multiple tables in series, not nested, in which case mine works correctly and yours fails. Here is one that uses recursion and works for both cases:

Code: Select all

<?php // File: NestedTables.php
$data = file_get_contents('NestedTablesTestData.html');
$pattern = '%
<table\b[^>]*+>         # match opening TABLE tag
(?:                     # non-capture group for alternation 
  (?:                   # match chars inside a TABLE element
    (?!                 # at a position that is not followed by
      <table\b[^>]*+>   # either an opening TABLE tag
    |                   # or
      </table>          # a closing TABLE tag
    ).                  # match one char
  )++                   # until all chars within TABLE consumed
|                       # or...
  (?R)                  # match a whole nested TABLE element
)*+                     # as many as it takes until
</table>                # balanced closing TABLE tag is matched
%six';
 
if (preg_match($pattern, $data, $matches) > 0) {
print_r($matches);
}
?>
Here is the "NestedTablesTestData.html" test file that works with the above script...

Code: Select all

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head><title>Test Nested Tables</title></head>
<body>
<table>
  <tr><th>A1</th><th>B1</th></tr>
  <tr><td>
    <table>
      <tr><th>A2</th><th>B2</th></tr>
      <tr><td>
        <table>
          <tr><th>A3</th><th>B3-xxx</th></tr>
          <tr><td>1</td><td>2</td></tr>
        </table>
      </td><td>
        <table>
          <tr><th>A3</th><th>B3</th></tr>
          <tr><td>1</td><td>2</td></tr>
        </table>
      </td></tr>
    </table>
  </td><td>
    <table>
      <tr><th>A2</th><th>B2</th></tr>
      <tr><td>
        <table>
          <tr><th>A3</th><th>B3</th></tr>
          <tr><td>1</td><td>2</td></tr>
        </table>
      </td><td>
        <table>
          <tr><th>A3</th><th>B3</th></tr>
          <tr><td>1</td><td>2</td></tr>
        </table>
      </td></tr>
    </table>
  </td></tr>
</table>
<p>Stuff between the two main tables</p>
<table>
  <tr><th>A1</th><th>B1</th></tr>
  <tr><td>
    <table>
      <tr><th>A2</th><th>B2</th></tr>
      <tr><td>
        <table>
          <tr><th>A3</th><th>B3-xxx</th></tr>
          <tr><td>1</td><td>2</td></tr>
        </table>
      </td><td>
        <table>
          <tr><th>A3</th><th>B3</th></tr>
          <tr><td>1</td><td>2</td></tr>
        </table>
      </td></tr>
    </table>
  </td><td>
    <table>
      <tr><th>A2</th><th>B2</th></tr>
      <tr><td>
        <table>
          <tr><th>A3</th><th>B3</th></tr>
          <tr><td>1</td><td>2</td></tr>
        </table>
      </td><td>
        <table>
          <tr><th>A3</th><th>B3</th></tr>
          <tr><td>1</td><td>2</td></tr>
        </table>
      </td></tr>
    </table>
  </td></tr>
</table>
</body>
</html>
 
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Re: Table regex.. ??

Post by Ollie Saunders »

Nice. I think lazy is just another word for reluctant.
User avatar
ridgerunner
Forum Contributor
Posts: 214
Joined: Sun Jul 05, 2009 10:39 pm
Location: SLC, UT

Re: Table regex.. ??

Post by ridgerunner »

The regex in my previous post matches outermost TABLE elements each of which may contain nested TABLEs. The following regex matches innermost TABLE elements, which may NOT contain nested TABLEs.

Code: Select all

<?php // File: NestedTablesInnermost.php
$data = file_get_contents('NestedTablesTestData.html');
// regex to match innermost TABLEs which may NOT contain nested TABLEs
$pattern_innermost = '%
<table\b[^>]*+>         # match opening TABLE tag
(?:                     # match chars inside a TABLE element
  (?!                   # at a position that is not followed by
    <table\b[^>]*+>     # either an opening TABLE tag
  |                     # or
    </table>            # a closing TABLE tag
  ).                    # match one char
)*+                     # until all chars within TABLE consumed
</table>                # match closing TABLE tag
%six';
 
if (preg_match($pattern_innermost, $data, $matches) > 0) {
echo("Inner pattern matched. Here are the results:\r\n");
print_r($matches);
}
?>
Matching TABLEs that lie in-between these two extremes would not be a job for a regex. 8)
User avatar
ridgerunner
Forum Contributor
Posts: 214
Joined: Sun Jul 05, 2009 10:39 pm
Location: SLC, UT

Re: Table regex.. ??

Post by ridgerunner »

Ollie Saunders wrote:Nice. I think lazy is just another word for reluctant.
I got the term lazy from Jeffrey Friedl's classic: "Mastering Regular Expressions - 3rd Edition". (highly recommended).
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Re: Table regex.. ??

Post by Ollie Saunders »

I really hope the thread author finds these useful.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Table regex.. ??

Post by prometheuzz »

Ollie Saunders wrote:Nice. I think lazy is just another word for reluctant.
That is correct.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Table regex.. ??

Post by prometheuzz »

ridgerunner wrote:... Jeffrey Friedl's classic: "Mastering Regular Expressions - 3rd Edition". (highly recommended).
Seconded!
Post Reply