Regex to Parse HTML , help needed!

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
emilios1995
Forum Newbie
Posts: 1
Joined: Sat Sep 22, 2012 10:45 pm

Regex to Parse HTML , help needed!

Post by emilios1995 »

HI, i am working on a project that includes HTML parsing, I want to parse the contents of a certain <div> tag. I have a parsing function to return the text between two strings using PHP functions, the problem is that inside that <div> there is another <div>

Code: Select all

<div class=anything>
...
...
<div>
...
</div>
...
</div>
so if I write

Code: Select all

 return_between("<div id=anything" , "</div>")
that will return the till the end of that inside div, becouse if the first closing div the program finds, not the closing of the main div. so i think that the solution lies in regex, could anyone give me an idea of how to write that expression, since i am not well trained on them.

Thank You !
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Regex to Parse HTML , help needed!

Post by requinix »

Regex is not good for this kind of text parsing. Use the tools that exist specifically for purposes like this: DOMDocument being the most notable.

Use a combination of getElementById(), getElementsByTagName(), and regular node traversal (like "this node's second child node") to get to where you need.
Post Reply