Parsing some html

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
andym01480
Forum Contributor
Posts: 390
Joined: Wed Apr 19, 2006 5:01 pm

Parsing some html

Post by andym01480 »

I'm trying to pull data out of a string that has this sort of thing in it...

Code: Select all

<td align="left" valign="top" bgcolor="#EBF2FA" class="tableDate">Wed - Jun 07 -- Thu - Jun 08<br><span class="tableTime">20:00 - 22:00</span></td>
    <td align="left" valign="top" bgcolor="#FFFDF2" class="tableCategory s21">&nbsp;</td>
    <td align="left" valign="top" bgcolor="#FFFDF2" class="tableTitle" >Rhubarb Blah Blah <br>
    <span class="tableDescr">in the Guide Hut </span></td>
  </tr>
The date and time I have sussed out...

Code: Select all

preg_match_all('/(<td align=\"left\" valign=\"top\" bgcolor=\"#EBF2FA\" class=\"tableDate\">)(([A-Z]{3}\s-\s[A-Z]{3}\s\d\d)|([A-Z]{3}\s-\s[A-Z]{3}\s\d\d\s--\s[A-Z]{3}\s-\s[A-Z]{3}\s\d\d))(<br><span class=\"tableTime\">)(\d\d:\d\d\s-\s\d\d:\d\d)/is', $output, $matches, PREG_SET_ORDER);
How do I pull out the tableTitle and table Descr data which could be anything?

I've tried

Code: Select all

preg_match_all('/(<td align=\"left\" valign=\"top\" bgcolor=\"#EBF2FA\" class=\"tableDate\">)(([A-Z]{3}\s-\s[A-Z]{3}\s\d\d)|([A-Z]{3}\s-\s[A-Z]{3}\s\d\d\s--\s[A-Z]{3}\s-\s[A-Z]{3}\s\d\d))(<br><span class=\"tableTime\">)(\d\d:\d\d\s-\s\d\d:\d\d)(<\/td><td align=\"left\" valign=\"top\" bgcolor=\"#FFFDF2\" class=\"tableCategory s21\">&nbsp;<\/td><td align=\"left\" valign=\"top\" bgcolor=\"#FFFDF2\" class=\"tableTitle\">)(\w.)(<br>)/is', $output, $matches, PREG_SET_ORDER);
But it doesn't work. Can anyone spot where I have gone wrong?
User avatar
twigletmac
Her Royal Site Adminness
Posts: 5371
Joined: Tue Apr 23, 2002 2:21 am
Location: Essex, UK

Post by twigletmac »

Perhaps something like this?

Code: Select all

$match_string =<<<END

<td align="left" valign="top" bgcolor="#EBF2FA" class="tableDate">(.+?)<br><span class="tableTime">(.+?)</span></td>
    <td align="left" valign="top" bgcolor="#FFFDF2" class="tableCategory s21">&nbsp;</td>
    <td align="left" valign="top" bgcolor="#FFFDF2" class="tableTitle" >(.+?)<br>
    <span class="tableDescr">(.+?)</span></td>
  </tr>
END;

preg_match_all('^'.$match_string.'^is', $output, $matches, PREG_SET_ORDER);
The ? stops the matches from being greedy and using heredoc makes it a bit easier to set out.

Mac
User avatar
andym01480
Forum Contributor
Posts: 390
Joined: Wed Apr 19, 2006 5:01 pm

Post by andym01480 »

Thanks - it sure looks neater. But

Code: Select all

print_r($matches)
returns

Code: Select all

Array ( )
, so it hasn't worked!

Code: Select all

$output=<<<TEST
    <td align="left" valign="top" bgcolor="#EBF2FA" class="tableDate">Wed - Jun 07 -- Thu - Jun 08<br><span class="tableTime">20:00 - 22:00</span></td>
    <td align="left" valign="top" bgcolor="#FFFDF2" class="tableCategory s21">&nbsp;</td>
    <td align="left" valign="top" bgcolor="#FFFDF2" class="tableTitle" >Steps to Freedom Evening <br>
    <span class="tableDescr">in the Guide Hut </span></td>

TEST;

//process $output blah blah

$match_string=<<<END
<td align="left" valign="top" bgcolor="#EBF2FA" class="tableDate">(.+?)<br><span class="tableTime">(.+?)</span></td> 
    <td align="left" valign="top" bgcolor="#FFFDF2" class="tableCategory s21">&nbsp;</td> 
    <td align="left" valign="top" bgcolor="#FFFDF2" class="tableTitle" >(.+?)<br> 
    <span class="tableDescr">(.+?)</span></td>
END;

preg_match_all('^'.$match_string.'^is', $output, $matches, PREG_SET_ORDER);
  print_r ($matches);
Have I missed something
Post Reply