I had a bit more time to look at your problem. Try this code on for size:
Code: Select all
// here is the regex
$re_short = '%<b>\s*<u>\s*(\w+(?:\s+\w+)*)\s*</u>\s*</b>\s*<br\s*/>\s*(.*?)\s*</td>%six';
// here is the same regex (long, commented form using the 'x' modifier)
$re_long = '% # use the x modifier to allow free-spacing comments
<b>\s* # match opening BOLD tag and zero or more whitespace
<u>\s* # match opening UNDERLINE tag and zero or more whitespace
(\w+(?:\s+\w+)*)\s* # capture name in group 1. (name can have more than one word)
</u>\s* # match closing UNDERLINE tag and zero or more whitespace
</b>\s* # match closing BOLD tag and zero or more whitespace
<br\s*/> # match BR tag
\s*(.*?)\s* # capture description in group 2 but not surrounding whitespace
</td> # finally match closing TD tag
%six';
$cnt = preg_match_all($re_long, $contents, $matches, PREG_SET_ORDER);
if ($cnt > 0) {
for ($i = 0; $i < $cnt; $i++) {
echo (sprintf("\nWrestler #%d: \"%s\"\n%s\n",
$i + 1, $matches[$i][1], preg_replace('%\s*<br\s*/>\s*%', "\n", $matches[$i][2])));
}
} else {
print ("No matches");
}
I've provided the regex in both short and long (commented) format. To match correctly, this reges does depend on the fact that the name is sandwiched between underlined+bold tags. The wrestler's name is captured into group 1 and their description is captured into group 2. It allows optional spacing to occur between tags and uses the versatile sprintf() function to format the output. Also, the description for each wrestler is further processed: each <br /> tag, (along with any adjacent whitespace) is converted to a simple linefeed.
Once again, I recommend checking out
http://www.regular-expressions.info/. Once you start getting the hang of regular expressions, they can actually become fun (and in my case: addicting!)
Cheers!