Legal scraping, not sure where to start!

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Citizen
Forum Contributor
Posts: 300
Joined: Wed Jul 20, 2005 10:23 am

Post by Citizen »

Can anyone send me in the right direction?

I've done a ton of reading about regex and I cant figure out a way to get the results to match up. I can code it to find each of the pieces of information (the code I posted above first) but I cant get them to match up. Sometimes the results for one user name doesn't match up with the other results. It skips around and I don't know what to do with it or even if its possible.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Code: Select all

$string = '<td class="list_data_col02_c"><span class="guild_name">MØNST€®</span></td>

                        <td class="list_data_col03_c"><span>20</span></td>
                        <td class="list_data_col04_c"><span>497,903</span></td>
                        <td class="list_data_col05_c"><span>869/1020
                        
                        (46%)
                        
                        </span></td>';
	
preg_match_all('#<td class="[^"]+"><span>(\d{3},\d{3}|\d{3}/\d{4}\s+\(\d{2}%\)|\d{2})#s', $string, $result);


echo '<pre>';
print_r($result);
Okay heres a start for you, I'm actually not too sure how to match MØNST€® so I'll leave that up to you. Aside from that, you might want to make a subpattern to isolate the percentage.

Outputs

Code: Select all

<pre>Array
(
    [0] => Array
        (
            [0] => <td class="list_data_col03_c"><span>20
            [1] => <td class="list_data_col04_c"><span>497,903
            [2] => <td class="list_data_col05_c"><span>869/1020
                        
                        (46%)
        )

    [1] => Array
        (
            [0] => 20
            [1] => 497,903
            [2] => 869/1020
                        
                        (46%)
        )

)
Citizen
Forum Contributor
Posts: 300
Joined: Wed Jul 20, 2005 10:23 am

Post by Citizen »

What does

(\d{3},\d{3}|\d{3}/\d{4}\s+\(\d{2}%\)|\d{2})

do differently than (.*?)?
Post Reply