Page 1 of 1

Regex to Parse HTML , help needed!

Posted: Sat Sep 22, 2012 11:17 pm
by emilios1995
HI, i am working on a project that includes HTML parsing, I want to parse the contents of a certain <div> tag. I have a parsing function to return the text between two strings using PHP functions, the problem is that inside that <div> there is another <div>

Code: Select all

<div class=anything>
...
...
<div>
...
</div>
...
</div>
so if I write

Code: Select all

 return_between("<div id=anything" , "</div>")
that will return the till the end of that inside div, becouse if the first closing div the program finds, not the closing of the main div. so i think that the solution lies in regex, could anyone give me an idea of how to write that expression, since i am not well trained on them.

Thank You !

Re: Regex to Parse HTML , help needed!

Posted: Sun Sep 23, 2012 3:35 am
by requinix
Regex is not good for this kind of text parsing. Use the tools that exist specifically for purposes like this: DOMDocument being the most notable.

Use a combination of getElementById(), getElementsByTagName(), and regular node traversal (like "this node's second child node") to get to where you need.