Page 1 of 1

Pretty tough regex

Posted: Tue Jun 02, 2009 4:23 pm
by codexpoet
...at least for me being a relatively "light" regex user. Fellas, I need your help as you are my last resort after few days of worthless googling and experimenting :roll:
I am trying to construct a regex for keyword autolink in html. the keyword should only match if it's
(1) outside a html tag and
(2) not between the <a..></a> tags since this would mean it is already a part of a link and shouldn't be autolinked again.

So, in the following code if I am looking to autolink "apple":

Code: Select all

 
The big red apple was growing on an <a href="appletree.com" title="apple tree">Apple Tree</a>
<img src="apple.gif" title="apple tree">
 
The regex should only match the first apple since all the rest are either a part of a tag or between the Anchor tag.

Any tips would be greatly appreciated!!

Re: Pretty tough regex

Posted: Tue Jun 02, 2009 5:33 pm
by patton
I know this isn't right, this is quite a hard problem!

preg_match_all('/(?<!>)apple(?![^<]*>)/i', 'The big red apple was growing on an <a href="appletree.com" title="apple tree">Apple Tree</a> <img src="apple.gif" title="apple tree">', $result);

which doesn't work if apple does not follow the >.

I would try to do this by running through the text once and removing all the anchor tags, then running something like:
'/apple(?![^<]*>)/i'

references:
http://www.perl.com/doc/manual/html/pod/perlre.html
http://regex.larsolavtorvik.com/

Re: Pretty tough regex

Posted: Wed Jun 03, 2009 12:41 am
by prometheuzz
This should do it:

Code: Select all

$text = 'The big red apple was growing on an <a href="appletree.com" title="apple tree">Apple Tree</a>
<img src="apple.gif" title="apple tree">';
echo preg_replace('#apple(?![^<>]*(?:>|</a>))#i', 'REPLACEMENT', $text);