ridgerunner wrote:prometheuzz wrote:Try:
Code: Select all
preg_replace('#<a\s[^>]*href="[^"]*site[^>]*+>([^<]*+)</a>#i', '$1', $html);
Although IMO a more robust solution would be to use an html parser: when a regex stumbles over some improperly formed html, it usually makes a mess of the entire file/html whereas a true parser will recover from it in most cases.
Hey prometheuzz,
You've got my curiousity up. I'm familiar with how to use the DOM within Javascript, but it sounds like you are talking about something else. What HTML parser tools do you use/recommend?
Thanks

Hey ridgerunner,
I must confess that I know very little about web-related stuff. So I can't recommend a parser that I know of and/or have personal experience with. The reason I sometimes mention the fact that parsing html using regex can be dangerous is because it has happened often that the original poster comes back with some dirty html asking why my solution didn't work.
It's more of a personal motto: when parsing (and/or transforming) some language that can have a recursive nature (like html), use a dedicated parser and don't go hacking your way using regex. By definition, regex is (as the name suggests) a regular language not capable of arbitrary recursion*: only to a fixed depth.
Of course, anchor tags cannot be nested, so you should be okay using a little regex (that's why I posted an actual suggestion), but still the html can be improperly formed (missing closing- tags or quotes) in which case the regex will make a mess of it and a parser (should!) not.
Keep up the good postings!
Regards,
Bart.
* Yes, I know PHP has the ability to match recursively, which IMHO is not a feature and makes ones regex-es only usable by people who actually know regex (not the masses!). Which makes them even more a maintainability nightmare.