Help with regex code for table
Moderator: General Moderators
Help with regex code for table
Hi all,
Im hoping somebody can help me here. Im kind of new to regex and am having some problems gettign started. Basically what i want to do is extract the data from each rom from the weblink below. ie i want to be able to extract the data into an array which i can then upload to mysql database. Regex is the way to do this.
http://www.sportinglife.com/football/pr ... table.html
Im not sure where to start. Can someone give me an example of code so i can get each row of the table starting from the top.
Thanks very much
Im hoping somebody can help me here. Im kind of new to regex and am having some problems gettign started. Basically what i want to do is extract the data from each rom from the weblink below. ie i want to be able to extract the data into an array which i can then upload to mysql database. Regex is the way to do this.
http://www.sportinglife.com/football/pr ... table.html
Im not sure where to start. Can someone give me an example of code so i can get each row of the table starting from the top.
Thanks very much
- Kieran Huggins
- DevNet Master
- Posts: 3635
- Joined: Wed Dec 06, 2006 4:14 pm
- Location: Toronto, Canada
- Contact:
so each row looks like this:
You have a fixed number of columns, each wrapped in td tags.
your regex match will be something along the lines of:
read:
http://ca3.php.net/manual/en/function.p ... ch-all.php
and:
http://ca3.php.net/manual/en/reference. ... syntax.php
and the whole section in general actually
Code: Select all
<tr>
<td align="left">
<a href="/football/premiership/manu/news/">Man Utd</a>
</td>
<td class="body_bg_2_co" align="center">29</td>
<td class="body_bg_2_co" align="center">12</td>
<td class="body_bg_2_co" align="center">1</td>
<td class="body_bg_2_co" align="center">1</td>
<td class="body_bg_2_co" align="center">35</td>
<td class="body_bg_2_co" align="center">8</td>
<td class="body_bg_2_co" align="center">11</td>
<td class="body_bg_2_co" align="center">2</td>
<td class="body_bg_2_co" align="center">2</td>
<td class="body_bg_2_co" align="center">31</td>
<td class="body_bg_2_co" align="center">11</td>
<td class="body_bg_2_co" align="center">72</td>
<td class="body_bg_2_co" align="center">47</td>
</tr>your regex match will be something along the lines of:
Code: Select all
<tr>\n<td align="left">\n<a[^>]+>(\w+)</a>\n</td>\n<td[^>]+>(\d+)</td>\n ...repeat...\n<tr/>http://ca3.php.net/manual/en/function.p ... ch-all.php
and:
http://ca3.php.net/manual/en/reference. ... syntax.php
and the whole section in general actually
feyd | Please use
I guess what im trying to do there is declare $file as the contents of the URL, then using the preg_match_all finding that pattern from $file and putting it into variable $match, then printing $match
Am i going the wrong way about this? Am i along the right tacks? Sorry, im kind of confused
feyd | Please use
Code: Select all
,Code: Select all
and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read: [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]
Much apreciated, thanks alot for the help Kieran
I had a go at this and tried creating code to extract this info into the format i want, eg (Man Utd, 33, 28, 3, 2, etc etc) but it doesnt work.Code: Select all
<?
$file = file_get_contents("http://www.sportinglife.com/football/premiership/table/table.html");
preg_match_all("<tr>\n<td align="left">\n<a[^>]+>(\w+)</a>\n</td>\n<td[^>]+>(\d+)</td>\n<tr/>",$file,$match);
print_r($match);
?>Am i going the wrong way about this? Am i along the right tacks? Sorry, im kind of confused
feyd | Please use
Code: Select all
,Code: Select all
and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read: [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]- Kieran Huggins
- DevNet Master
- Posts: 3635
- Joined: Wed Dec 06, 2006 4:14 pm
- Location: Toronto, Canada
- Contact:
Alright thanks guys. Have re-adjusted the script a little to the following, but am still getting errors:
First of all, am i using the correct functions and writing it the correct way for doing what i want to do. I just feel as though another error will pop up when i get this one solved?
And secondly, i get this error message when i try and run the script:
Warning: preg_match_all() [function.preg-match-all]: Unknown modifier 'a' in /home/***/test.php on line 14
Array ( )
Thanks for baring with me
Code: Select all
<?
$file = file_get_contents("http://www.sportinglife.com/football/premiership/table/table.html");
preg_match_all("/<tr>\n<td align=\"left\">\n<a[^>]+>(\w+)</a>\n</td>\n<td[^>]+>(\d+)</td>\n<tr/>/",$file,$match);
print_r($match);
?>And secondly, i get this error message when i try and run the script:
Warning: preg_match_all() [function.preg-match-all]: Unknown modifier 'a' in /home/***/test.php on line 14
Array ( )
Thanks for baring with me
I'm not sure of whole regexp will work or not but I feel instead of \n I would suggest \n? or \s?(better one).kerna wrote:Code: Select all
<? $file = file_get_contents("http://www.sportinglife.com/football/premiership/table/table.html"); preg_match_all("/<tr>\n<td align="left">\n<a[^>]+>(\w+)</a>\n</td>\n<td[^>]+>(\d+)</td>\n<tr/>/",$file,$match); print_r($match); ?>
Try the following pattern works perfectly for Kieran Huggins posted data.
Code: Select all
$pattern=/<tr>{1}?\s?<td\s?align="left">\s?<a[^>]+>(?:\w+\s?\w)+<\/a>\s?<\/td>(?:<td[^>]+>\d{1,}<\/td>)+/i- Kieran Huggins
- DevNet Master
- Posts: 3635
- Joined: Wed Dec 06, 2006 4:14 pm
- Location: Toronto, Canada
- Contact:
Kindly ignore the above expression. The following should be a perfect match
and lets know it served the purpose
Code: Select all
$pattern='/(?:<tr>{1}?\s?<td\s?align=\"left\">\s?<a[^>]+>(?:\w+\s?\w)+<\/a>\s?<\/td>(?:<td[^>]+>\d{1,}<\/td>)+<\/tr>)+/'Thanks guys. The code Dude81 posted seems to have helped. However i still think im doing something wrong.
I have the following php file saved:
When run on the server it produces the following:
"Array ( [0] => Array ( ) ) "
Something just doesnt seem right. Is there a reason its bringing up this rather than the text i want extracted as per the regex code? Is print_r the right command to use here, or declaring the variable $file with file_get_contents?
Thanks for the help guys
I have the following php file saved:
Code: Select all
<?
$file = file_get_contents("http://www.sportinglife.com/football/premiership/table/table.html");
preg_match_all('/(?:<tr>{1}?\s?<td\s?align=\"left\">\s?<a[^>]+>(?:\w+\s?\w)+<\/a>\s?<\/td>(?:<td[^>]+>\d{1,}<\/td>)+<\/tr>)+/',$file,$match);
print_r($match);
?>"Array ( [0] => Array ( ) ) "
Something just doesnt seem right. Is there a reason its bringing up this rather than the text i want extracted as per the regex code? Is print_r the right command to use here, or declaring the variable $file with file_get_contents?
Thanks for the help guys
Hi Dude,dude81 wrote:I used preg_match, but before that see does your file gets the output or not. Also try to view the source when it prints. Generally any code in <> (tags) is not visible. Try to view source.
I tried printing $file which successfully printed the contents of the webpage in question. However when i print_r the variable $match it is still coming up as:
"Array ( [0] => Array ( ) ) "
When what im after is for it to be like (Man Utd, 33, 28, 3, 2, etc etc) so i can insert it into a mysql database