Page 1 of 1

Need a preg_match_all regex

Posted: Tue Feb 23, 2010 8:39 am
by anon404
What would the regular expression for getting the MATCH ME content from this html be?

Code: Select all

<div class="headShotContainer">
        <a href="wrestler_free.jsp?wrestlerId=301">
        <img src="http://img.ultimatesurrender.com/2257/5107_adriannaone_01.jpg" class="wrestlerImage" />
        <br />
 
        [b]MATCH ME[/b]</a>
    </div>
<div class="headShotContainer">
        <a href="wrestler_free.jsp?wrestlerId=12340">
        <img src="http://img.ultimatesurrender.com/2257/5271_US005_01.jpg" class="wrestlerImage" />
        <br />
        [b]MATCH ME[/b]</a>
    </div>
 
<div class="headShotContainer">
        <a href="wrestler_free.jsp?wrestlerId=19863">
        <img src="http://img.ultimatesurrender.com/2257/7416_allyannone.jpg" class="wrestlerImage" />
        <br />
        [b]MATCH ME[/b]</a>
    </div>
 

Re: Need a preg_match_all regex

Posted: Tue Feb 23, 2010 10:06 pm
by ridgerunner
Try this:

Code: Select all

preg_match_all('%(?<=<br />).*?(?=</a>)%s', $contents, $matches, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($matches[0]); $i++) {
    # Matched text = $result[0][$i];
}

Re: Need a preg_match_all regex

Posted: Wed Feb 24, 2010 1:59 am
by anon404
Thanks.

What regex implementation is that? What does the = char do?


I'm trying to get the stats of the wrestlers now but can't figure out why this regex isn't getting results where $tok is the a wrestler name.

Code: Select all

"/(<b><u>$tok<\/u><\/br>).*(<\/td>)/si"
My desired match for Mellanie's stats would be:

Code: Select all

The Cowgirl <br />
HT: 5'9<br />
WT: 155 lbs<br />
Season record (0-3) <br />
Lifetime record (0-3)<br />
Ranked 14th
 
HTML:

Code: Select all

<b><u>MELLANIE</u></b> <br />
The Cowgirl <br />
HT: 5'9<br />
WT: 155 lbs<br />
Season record (0-3) <br />
Lifetime record (0-3)<br />
Ranked 14th
 
</td>
<td width="50%" valign="top" align="left">
 
<b><u>MAGGIE</u></b> <br />
 
The Molester<br />
HT: 5'5<br />
WT: 130lbs<br />
Season record (0-0) <br />
Lifetime record (0-0)<br />
Not Ranked
 
</td>
</tr>
</table><br />
 
 
Mellanie has been given some pretty tough opponents for her rookie year.  We call it tough love: get your ass kicked by the best so you learn from the best.  Then, one day, you earn the right to fight your very own new girl.  Did The Cowgirl learn anything at all from her many crushing defeats?<br /><br />
 
Welcome Maggie Mayhem to Ultimate Surrender, this local fetish model is one tough cookie.  She is a submissive and is used to taking pain and punishment, so fear is the last thing on her mind entering the mat.  Maggie is strong, and doesn't think for a second that this big titted girl from Texas will pose any threat to her at all. Maggie is a tough San Fransisco girl that doesn't take <span style='color:blue' title='I&#39;m naughty, are you naughty?'>smurf</span> from anyone. <br /> <br />
 
Well what we get is one girl totally dominating the other. This was a great sex fight as Mellanie wore Maggie down, face sitting her, locking the leg scissors, and fingering Maggie's helpless pussy on the mat.  Congratulations to Mellanie Monroe for her first Ultimate Surrender win!
 
</div>
 

Re: Need a preg_match_all regex

Posted: Wed Feb 24, 2010 11:32 pm
by ridgerunner
The = is part of the positive lookahead assertion which has the form '(?=xyz)' which says: match a position where the following characters are 'xyz'. This is part of the PHP preg_*() suite of regular expression functions that make use of the powerful PCRE regex engine. It has positive and negative lookahead and lookbehind. For more info, check out the tutorial at http://www.regular-expressions.info/

Your regex is using the "greedy" version of the dot star combination. This matches everything up to the end of the string, then backtracks until it finds a match for the following characters. To get the match you are looking for you need to use the lazy version of the dot star. To make the star (or any other quantifier) lazy, just append a question mark right after like so: '.*?' Once again, please refer to the tutorial for a complete explanation.

Hope this helps! :)

Re: Need a preg_match_all regex

Posted: Thu Feb 25, 2010 11:15 am
by ridgerunner
I had a bit more time to look at your problem. Try this code on for size:

Code: Select all

// here is the regex
$re_short = '%<b>\s*<u>\s*(\w+(?:\s+\w+)*)\s*</u>\s*</b>\s*<br\s*/>\s*(.*?)\s*</td>%six';
 
// here is the same regex (long, commented form using the 'x' modifier)
$re_long = '%           # use the x modifier to allow free-spacing comments
<b>\s*                  # match opening BOLD tag and zero or more whitespace
<u>\s*                  # match opening UNDERLINE tag and zero or more whitespace
(\w+(?:\s+\w+)*)\s*     # capture name in group 1. (name can have more than one word)
</u>\s*                 # match closing UNDERLINE tag and zero or more whitespace
</b>\s*                 # match closing BOLD tag and zero or more whitespace
<br\s*/>                # match BR tag
\s*(.*?)\s*             # capture description in group 2 but not surrounding whitespace
</td>                   # finally match closing TD tag
%six';
 
$cnt = preg_match_all($re_long, $contents, $matches, PREG_SET_ORDER);
if ($cnt > 0) {
    for ($i = 0; $i < $cnt; $i++) {
        echo (sprintf("\nWrestler #%d: \"%s\"\n%s\n",
            $i + 1, $matches[$i][1],  preg_replace('%\s*<br\s*/>\s*%', "\n", $matches[$i][2])));
    }
} else {
    print ("No matches");
}
 
I've provided the regex in both short and long (commented) format. To match correctly, this reges does depend on the fact that the name is sandwiched between underlined+bold tags. The wrestler's name is captured into group 1 and their description is captured into group 2. It allows optional spacing to occur between tags and uses the versatile sprintf() function to format the output. Also, the description for each wrestler is further processed: each <br /> tag, (along with any adjacent whitespace) is converted to a simple linefeed.

Once again, I recommend checking out http://www.regular-expressions.info/. Once you start getting the hang of regular expressions, they can actually become fun (and in my case: addicting!)

Cheers!

Re: Need a preg_match_all regex

Posted: Thu Feb 25, 2010 12:02 pm
by AbraCadaver
Get back to work!