I'm working on a software that extracts some contents from a web site.
For example, this is html:
Code: Select all
<div id="panel">
<span id="title">Crazy Maradona</span>
<br>
<div id="news">text of news about Maradona</div>
</div>
So..
I created a software php that start from tags between title and text (</span><br><div id="news") and "builds" the regexp of structure.
For this example, I'have:
Code: Select all
<div [^>]*>\s*<span [^>]*>\s*[^<]+</span>\s<br>\s<div [^>]*>[^<]+</div>\s</div>
my software crash with a structure more complicated, for example:
Code: Select all
<div id="panel">[color=#BF0000]<img src="image">[/color]
<span id="title">Crazy Maradona</span>
<br>
<div id="news">text of news about Maradona</div>
</div>
or..
Code: Select all
<div id="panel">
<span id="title">Crazy Maradona</span>
<br>
<div id="news">text of news[color=#BF0000]<img src="image">[/color] about Maradona</div>
</div>
Can help me to create an "mega" regexp for these bug?
Thanks a lot!!!