Help with regex code for table

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

kerna
Forum Newbie
Posts: 9
Joined: Wed Mar 07, 2007 6:52 am

Help with regex code for table

Post by kerna »

Hi all,

Im hoping somebody can help me here. Im kind of new to regex and am having some problems gettign started. Basically what i want to do is extract the data from each rom from the weblink below. ie i want to be able to extract the data into an array which i can then upload to mysql database. Regex is the way to do this.

http://www.sportinglife.com/football/pr ... table.html

Im not sure where to start. Can someone give me an example of code so i can get each row of the table starting from the top.

Thanks very much
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

so each row looks like this:

Code: Select all

<tr>
<td align="left">
<a href="/football/premiership/manu/news/">Man Utd</a>
</td>
<td class="body_bg_2_co" align="center">29</td>
<td class="body_bg_2_co" align="center">12</td>
<td class="body_bg_2_co" align="center">1</td>
<td class="body_bg_2_co" align="center">1</td>
<td class="body_bg_2_co" align="center">35</td>
<td class="body_bg_2_co" align="center">8</td>
<td class="body_bg_2_co" align="center">11</td>
<td class="body_bg_2_co" align="center">2</td>
<td class="body_bg_2_co" align="center">2</td>
<td class="body_bg_2_co" align="center">31</td>
<td class="body_bg_2_co" align="center">11</td>
<td class="body_bg_2_co" align="center">72</td>
<td class="body_bg_2_co" align="center">47</td>
</tr>
You have a fixed number of columns, each wrapped in td tags.

your regex match will be something along the lines of:

Code: Select all

<tr>\n<td align="left">\n<a[^>]+>(\w+)</a>\n</td>\n<td[^>]+>(\d+)</td>\n ...repeat...\n<tr/>
read:
http://ca3.php.net/manual/en/function.p ... ch-all.php
and:
http://ca3.php.net/manual/en/reference. ... syntax.php
and the whole section in general actually :-)
kerna
Forum Newbie
Posts: 9
Joined: Wed Mar 07, 2007 6:52 am

Post by kerna »

feyd | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]


Much apreciated, thanks alot for the help Kieran   

I had a go at this and tried creating code to extract this info into the format i want, eg (Man Utd, 33, 28, 3, 2, etc etc) but it doesnt work.

Code: Select all

<?

$file = file_get_contents("http://www.sportinglife.com/football/premiership/table/table.html");

preg_match_all("<tr>\n<td align="left">\n<a[^>]+>(\w+)</a>\n</td>\n<td[^>]+>(\d+)</td>\n<tr/>",$file,$match);

print_r($match);

?>
I guess what im trying to do there is declare $file as the contents of the URL, then using the preg_match_all finding that pattern from $file and putting it into variable $match, then printing $match

Am i going the wrong way about this? Am i along the right tacks? Sorry, im kind of confused :oops:


feyd | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Your pattern has unescaped quotes in it that break the string. Notice how the highlighted version of your post illustrates where the break is.
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

ah the joys of syntax highlighting...

you also need to wrap your regex in "delimiters": /regex/ OR #regex# OR |regex|....

Check out the Man page for examples - it should be written more clearly.
kerna
Forum Newbie
Posts: 9
Joined: Wed Mar 07, 2007 6:52 am

Post by kerna »

Alright thanks guys. Have re-adjusted the script a little to the following, but am still getting errors:

Code: Select all

<?

$file = file_get_contents("http://www.sportinglife.com/football/premiership/table/table.html");

preg_match_all("/<tr>\n<td align=\"left\">\n<a[^>]+>(\w+)</a>\n</td>\n<td[^>]+>(\d+)</td>\n<tr/>/",$file,$match);

print_r($match);

?>
First of all, am i using the correct functions and writing it the correct way for doing what i want to do. I just feel as though another error will pop up when i get this one solved?

And secondly, i get this error message when i try and run the script:

Warning: preg_match_all() [function.preg-match-all]: Unknown modifier 'a' in /home/***/test.php on line 14
Array ( )


Thanks for baring with me
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Due to the choice of using "/" as a delimiter, all other occurrences of the character must be escaped.
User avatar
dude81
Forum Regular
Posts: 509
Joined: Mon Aug 29, 2005 6:26 am
Location: Pearls City

Post by dude81 »

kerna wrote:

Code: Select all

<?

$file = file_get_contents("http://www.sportinglife.com/football/premiership/table/table.html");

preg_match_all("/<tr>\n<td align="left">\n<a[^>]+>(\w+)</a>\n</td>\n<td[^>]+>(\d+)</td>\n<tr/>/",$file,$match);

print_r($match);

?>
I'm not sure of whole regexp will work or not but I feel instead of \n I would suggest \n? or \s?(better one).
Try the following pattern works perfectly for Kieran Huggins posted data.

Code: Select all

$pattern=/<tr>{1}?\s?<td\s?align="left">\s?<a[^>]+>(?:\w+\s?\w)+<\/a>\s?<\/td>(?:<td[^>]+>\d{1,}<\/td>)+/i
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

if you use # or | instead it's easy to see where you need to escape those characters
User avatar
dude81
Forum Regular
Posts: 509
Joined: Mon Aug 29, 2005 6:26 am
Location: Pearls City

Post by dude81 »

Kindly ignore the above expression. The following should be a perfect match

Code: Select all

$pattern='/(?:<tr>{1}?\s?<td\s?align=\"left\">\s?<a[^>]+>(?:\w+\s?\w)+<\/a>\s?<\/td>(?:<td[^>]+>\d{1,}<\/td>)+<\/tr>)+/'
and lets know it served the purpose
kerna
Forum Newbie
Posts: 9
Joined: Wed Mar 07, 2007 6:52 am

Post by kerna »

Thanks guys. The code Dude81 posted seems to have helped. However i still think im doing something wrong.

I have the following php file saved:

Code: Select all

<?

$file = file_get_contents("http://www.sportinglife.com/football/premiership/table/table.html");

preg_match_all('/(?:<tr>{1}?\s?<td\s?align=\"left\">\s?<a[^>]+>(?:\w+\s?\w)+<\/a>\s?<\/td>(?:<td[^>]+>\d{1,}<\/td>)+<\/tr>)+/',$file,$match);

print_r($match);

?>
When run on the server it produces the following:

"Array ( [0] => Array ( ) ) "

Something just doesnt seem right. Is there a reason its bringing up this rather than the text i want extracted as per the regex code? Is print_r the right command to use here, or declaring the variable $file with file_get_contents?

Thanks for the help guys :)
User avatar
dude81
Forum Regular
Posts: 509
Joined: Mon Aug 29, 2005 6:26 am
Location: Pearls City

Post by dude81 »

I used preg_match, but before that see does your file gets the output or not. Also try to view the source when it prints. Generally any code in <> (tags) is not visible. Try to view source.
kerna
Forum Newbie
Posts: 9
Joined: Wed Mar 07, 2007 6:52 am

Post by kerna »

dude81 wrote:I used preg_match, but before that see does your file gets the output or not. Also try to view the source when it prints. Generally any code in <> (tags) is not visible. Try to view source.
Hi Dude,

I tried printing $file which successfully printed the contents of the webpage in question. However when i print_r the variable $match it is still coming up as:

"Array ( [0] => Array ( ) ) "

When what im after is for it to be like (Man Utd, 33, 28, 3, 2, etc etc) so i can insert it into a mysql database
User avatar
dude81
Forum Regular
Posts: 509
Joined: Mon Aug 29, 2005 6:26 am
Location: Pearls City

Post by dude81 »

Did you try with preg_match, what does it return
kerna
Forum Newbie
Posts: 9
Joined: Wed Mar 07, 2007 6:52 am

Post by kerna »

dude81 wrote:Did you try with preg_match, what does it return
Yeah mate, tried it with preg_match and it brought up the following:

Array ( )
Post Reply