Explanations have never been my strong point but I can try....
As I'm sure you are aware, regex is all about pattern matching so essentially all I did was have a quick look to try and find/spot the unique start and end patterns within the string. The line which contains the beginning of your text is/was....
Code: Select all
<td><table border=0 cellspacing=1 cellpadding=2 width="100%">
I had a quick glance through the entire code an noted only one other line which started with <td><table which was...
Code: Select all
<td><table width="100%" border="0" cellspacing="0" cellpadding="0" height="7">
Which although starts the same is completely different, so this was my starting point....
Code: Select all
preg_match('/^\s*?<td>(<table border=0 cellspacing=1
Breaking this starting point down is as follows....
The ^ character defines that that it is the start of a line (requires you to specify the 'm' (multiline) modifier).
\s means any space character and *? means zero or more occurrences of. I guessed/assumed that there was probably some indentation formatting of the code prior to posting on the forum so the \s*? would take care of this.
<td>(<table border=0 cellspacing=1 This is the unique start string (the opening parenthesis is the start of the sub pattern) actually if I had looked closer I would have realized that the '<table border=0 cellspacing=1' portion could have been shortened to just '<table b' as this would still be unique but as I just had a quick look I decided not to take that chance that I may have missed some other similar line.
The next part....
.... is the match all syntax (this has been extended to match everything including newlines by way of the 's' (dot all) modifier)
The last part....
Similar to the first part again, the ^ signifies it is the start of a line and I have again guess/assumed about the indentation. And the <\/table>)<\/td> again was unique (the closing parenthesis defines the end of the subpattern)
Finally the modifiers... ims...
i means that the pattern search will be case insensitive (probably not required in this case but would mean if for some reason the code was reformatted in uppercase tags the regex would still work.
m (multiline mode) this means that ^ and $ will match new lines within the string respectively(the default behaviour is that ^ and $ match the start and end of the entire string)
s (dot all) by using this modifier the . character will match everything including newlines (without it newlines are excluded from the match).
Some points to note, as it was thrown together quickly, the regex was/is longer than need be. As previously mentioned the start pattern could have been shortened as to could the .*? as in this case greedy or non-greedy the pattern match would end in the same place so the regex could be....
Code: Select all
if (preg_match('/^\s*?<td>(<table b.*^\s*?<\/table>)<\/td>/ims', $string, $matches))
and still give the same results.
I hope some of that make sense?