Page 1 of 1

Parsing some html

Posted: Tue Jun 06, 2006 5:15 pm
by andym01480
I'm trying to pull data out of a string that has this sort of thing in it...

Code: Select all

<td align="left" valign="top" bgcolor="#EBF2FA" class="tableDate">Wed - Jun 07 -- Thu - Jun 08<br><span class="tableTime">20:00 - 22:00</span></td>
    <td align="left" valign="top" bgcolor="#FFFDF2" class="tableCategory s21">&nbsp;</td>
    <td align="left" valign="top" bgcolor="#FFFDF2" class="tableTitle" >Rhubarb Blah Blah <br>
    <span class="tableDescr">in the Guide Hut </span></td>
  </tr>
The date and time I have sussed out...

Code: Select all

preg_match_all('/(<td align=\"left\" valign=\"top\" bgcolor=\"#EBF2FA\" class=\"tableDate\">)(([A-Z]{3}\s-\s[A-Z]{3}\s\d\d)|([A-Z]{3}\s-\s[A-Z]{3}\s\d\d\s--\s[A-Z]{3}\s-\s[A-Z]{3}\s\d\d))(<br><span class=\"tableTime\">)(\d\d:\d\d\s-\s\d\d:\d\d)/is', $output, $matches, PREG_SET_ORDER);
How do I pull out the tableTitle and table Descr data which could be anything?

I've tried

Code: Select all

preg_match_all('/(<td align=\"left\" valign=\"top\" bgcolor=\"#EBF2FA\" class=\"tableDate\">)(([A-Z]{3}\s-\s[A-Z]{3}\s\d\d)|([A-Z]{3}\s-\s[A-Z]{3}\s\d\d\s--\s[A-Z]{3}\s-\s[A-Z]{3}\s\d\d))(<br><span class=\"tableTime\">)(\d\d:\d\d\s-\s\d\d:\d\d)(<\/td><td align=\"left\" valign=\"top\" bgcolor=\"#FFFDF2\" class=\"tableCategory s21\">&nbsp;<\/td><td align=\"left\" valign=\"top\" bgcolor=\"#FFFDF2\" class=\"tableTitle\">)(\w.)(<br>)/is', $output, $matches, PREG_SET_ORDER);
But it doesn't work. Can anyone spot where I have gone wrong?

Posted: Wed Jun 07, 2006 3:34 am
by twigletmac
Perhaps something like this?

Code: Select all

$match_string =<<<END

<td align="left" valign="top" bgcolor="#EBF2FA" class="tableDate">(.+?)<br><span class="tableTime">(.+?)</span></td>
    <td align="left" valign="top" bgcolor="#FFFDF2" class="tableCategory s21">&nbsp;</td>
    <td align="left" valign="top" bgcolor="#FFFDF2" class="tableTitle" >(.+?)<br>
    <span class="tableDescr">(.+?)</span></td>
  </tr>
END;

preg_match_all('^'.$match_string.'^is', $output, $matches, PREG_SET_ORDER);
The ? stops the matches from being greedy and using heredoc makes it a bit easier to set out.

Mac

Posted: Wed Jun 07, 2006 8:45 am
by andym01480
Thanks - it sure looks neater. But

Code: Select all

print_r($matches)
returns

Code: Select all

Array ( )
, so it hasn't worked!

Code: Select all

$output=<<<TEST
    <td align="left" valign="top" bgcolor="#EBF2FA" class="tableDate">Wed - Jun 07 -- Thu - Jun 08<br><span class="tableTime">20:00 - 22:00</span></td>
    <td align="left" valign="top" bgcolor="#FFFDF2" class="tableCategory s21">&nbsp;</td>
    <td align="left" valign="top" bgcolor="#FFFDF2" class="tableTitle" >Steps to Freedom Evening <br>
    <span class="tableDescr">in the Guide Hut </span></td>

TEST;

//process $output blah blah

$match_string=<<<END
<td align="left" valign="top" bgcolor="#EBF2FA" class="tableDate">(.+?)<br><span class="tableTime">(.+?)</span></td> 
    <td align="left" valign="top" bgcolor="#FFFDF2" class="tableCategory s21">&nbsp;</td> 
    <td align="left" valign="top" bgcolor="#FFFDF2" class="tableTitle" >(.+?)<br> 
    <span class="tableDescr">(.+?)</span></td>
END;

preg_match_all('^'.$match_string.'^is', $output, $matches, PREG_SET_ORDER);
  print_r ($matches);
Have I missed something