Page 1 of 2
Help with regex code for table
Posted: Wed Mar 07, 2007 6:58 am
by kerna
Hi all,
Im hoping somebody can help me here. Im kind of new to regex and am having some problems gettign started. Basically what i want to do is extract the data from each rom from the weblink below. ie i want to be able to extract the data into an array which i can then upload to mysql database. Regex is the way to do this.
http://www.sportinglife.com/football/pr ... table.html
Im not sure where to start. Can someone give me an example of code so i can get each row of the table starting from the top.
Thanks very much
Posted: Wed Mar 07, 2007 10:44 am
by Kieran Huggins
so each row looks like this:
Code: Select all
<tr>
<td align="left">
<a href="/football/premiership/manu/news/">Man Utd</a>
</td>
<td class="body_bg_2_co" align="center">29</td>
<td class="body_bg_2_co" align="center">12</td>
<td class="body_bg_2_co" align="center">1</td>
<td class="body_bg_2_co" align="center">1</td>
<td class="body_bg_2_co" align="center">35</td>
<td class="body_bg_2_co" align="center">8</td>
<td class="body_bg_2_co" align="center">11</td>
<td class="body_bg_2_co" align="center">2</td>
<td class="body_bg_2_co" align="center">2</td>
<td class="body_bg_2_co" align="center">31</td>
<td class="body_bg_2_co" align="center">11</td>
<td class="body_bg_2_co" align="center">72</td>
<td class="body_bg_2_co" align="center">47</td>
</tr>
You have a fixed number of columns, each wrapped in td tags.
your regex match will be something along the lines of:
Code: Select all
<tr>\n<td align="left">\n<a[^>]+>(\w+)</a>\n</td>\n<td[^>]+>(\d+)</td>\n ...repeat...\n<tr/>
read:
http://ca3.php.net/manual/en/function.p ... ch-all.php
and:
http://ca3.php.net/manual/en/reference. ... syntax.php
and the whole section in general actually

Posted: Fri Mar 09, 2007 8:17 am
by kerna
feyd | Please use Code: Select all
and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read: [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]
Much apreciated, thanks alot for the help Kieran
I had a go at this and tried creating code to extract this info into the format i want, eg (Man Utd, 33, 28, 3, 2, etc etc) but it doesnt work.
Code: Select all
<?
$file = file_get_contents("http://www.sportinglife.com/football/premiership/table/table.html");
preg_match_all("<tr>\n<td align="left">\n<a[^>]+>(\w+)</a>\n</td>\n<td[^>]+>(\d+)</td>\n<tr/>",$file,$match);
print_r($match);
?>
I guess what im trying to do there is declare $file as the contents of the URL, then using the preg_match_all finding that pattern from $file and putting it into variable $match, then printing $match
Am i going the wrong way about this? Am i along the right tacks? Sorry, im kind of confused
feyd | Please use Code: Select all
and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read: [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]
Posted: Fri Mar 09, 2007 8:19 am
by feyd
Your pattern has unescaped quotes in it that break the string. Notice how the highlighted version of your post illustrates where the break is.
Posted: Fri Mar 09, 2007 8:35 am
by Kieran Huggins
ah the joys of syntax highlighting...
you also need to wrap your regex in "delimiters": /regex/ OR #regex# OR |regex|....
Check out the Man page for examples - it should be written more clearly.
Posted: Fri Mar 09, 2007 9:04 am
by kerna
Alright thanks guys. Have re-adjusted the script a little to the following, but am still getting errors:
Code: Select all
<?
$file = file_get_contents("http://www.sportinglife.com/football/premiership/table/table.html");
preg_match_all("/<tr>\n<td align=\"left\">\n<a[^>]+>(\w+)</a>\n</td>\n<td[^>]+>(\d+)</td>\n<tr/>/",$file,$match);
print_r($match);
?>
First of all, am i using the correct functions and writing it the correct way for doing what i want to do. I just feel as though another error will pop up when i get this one solved?
And secondly, i get this error message when i try and run the script:
Warning: preg_match_all() [function.preg-match-all]: Unknown modifier 'a' in /home/***/test.php on line 14
Array ( )
Thanks for baring with me
Posted: Fri Mar 09, 2007 9:13 am
by feyd
Due to the choice of using "/" as a delimiter, all other occurrences of the character must be escaped.
Posted: Fri Mar 09, 2007 10:08 am
by dude81
kerna wrote:
Code: Select all
<?
$file = file_get_contents("http://www.sportinglife.com/football/premiership/table/table.html");
preg_match_all("/<tr>\n<td align="left">\n<a[^>]+>(\w+)</a>\n</td>\n<td[^>]+>(\d+)</td>\n<tr/>/",$file,$match);
print_r($match);
?>
I'm not sure of whole regexp will work or not but I feel instead of \n I would suggest \n? or \s?(better one).
Try the following pattern works perfectly for Kieran Huggins posted data.
Code: Select all
$pattern=/<tr>{1}?\s?<td\s?align="left">\s?<a[^>]+>(?:\w+\s?\w)+<\/a>\s?<\/td>(?:<td[^>]+>\d{1,}<\/td>)+/i
Posted: Fri Mar 09, 2007 10:09 am
by Kieran Huggins
if you use # or | instead it's easy to see where you need to escape those characters
Posted: Fri Mar 09, 2007 10:13 am
by dude81
Kindly ignore the above expression. The following should be a perfect match
Code: Select all
$pattern='/(?:<tr>{1}?\s?<td\s?align=\"left\">\s?<a[^>]+>(?:\w+\s?\w)+<\/a>\s?<\/td>(?:<td[^>]+>\d{1,}<\/td>)+<\/tr>)+/'
and lets know it served the purpose
Posted: Mon Mar 12, 2007 1:40 am
by kerna
Thanks guys. The code Dude81 posted seems to have helped. However i still think im doing something wrong.
I have the following php file saved:
Code: Select all
<?
$file = file_get_contents("http://www.sportinglife.com/football/premiership/table/table.html");
preg_match_all('/(?:<tr>{1}?\s?<td\s?align=\"left\">\s?<a[^>]+>(?:\w+\s?\w)+<\/a>\s?<\/td>(?:<td[^>]+>\d{1,}<\/td>)+<\/tr>)+/',$file,$match);
print_r($match);
?>
When run on the server it produces the following:
"Array ( [0] => Array ( ) ) "
Something just doesnt seem right. Is there a reason its bringing up this rather than the text i want extracted as per the regex code? Is print_r the right command to use here, or declaring the variable $file with file_get_contents?
Thanks for the help guys

Posted: Mon Mar 12, 2007 8:37 am
by dude81
I used preg_match, but before that see does your file gets the output or not. Also try to view the source when it prints. Generally any code in <> (tags) is not visible. Try to view source.
Posted: Tue Mar 13, 2007 4:23 am
by kerna
dude81 wrote:I used preg_match, but before that see does your file gets the output or not. Also try to view the source when it prints. Generally any code in <> (tags) is not visible. Try to view source.
Hi Dude,
I tried printing $file which successfully printed the contents of the webpage in question. However when i print_r the variable $match it is still coming up as:
"Array ( [0] => Array ( ) ) "
When what im after is for it to be like (Man Utd, 33, 28, 3, 2, etc etc) so i can insert it into a mysql database
Posted: Tue Mar 13, 2007 4:33 am
by dude81
Did you try with preg_match, what does it return
Posted: Tue Mar 13, 2007 4:44 am
by kerna
dude81 wrote:Did you try with preg_match, what does it return
Yeah mate, tried it with preg_match and it brought up the following:
Array ( )