Pretty tough regex

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
codexpoet
Forum Newbie
Posts: 1
Joined: Tue Jun 02, 2009 4:13 pm

Pretty tough regex

Post by codexpoet »

...at least for me being a relatively "light" regex user. Fellas, I need your help as you are my last resort after few days of worthless googling and experimenting :roll:
I am trying to construct a regex for keyword autolink in html. the keyword should only match if it's
(1) outside a html tag and
(2) not between the <a..></a> tags since this would mean it is already a part of a link and shouldn't be autolinked again.

So, in the following code if I am looking to autolink "apple":

Code: Select all

 
The big red apple was growing on an <a href="appletree.com" title="apple tree">Apple Tree</a>
<img src="apple.gif" title="apple tree">
 
The regex should only match the first apple since all the rest are either a part of a tag or between the Anchor tag.

Any tips would be greatly appreciated!!
patton
Forum Newbie
Posts: 2
Joined: Fri May 29, 2009 5:30 pm

Re: Pretty tough regex

Post by patton »

I know this isn't right, this is quite a hard problem!

preg_match_all('/(?<!>)apple(?![^<]*>)/i', 'The big red apple was growing on an <a href="appletree.com" title="apple tree">Apple Tree</a> <img src="apple.gif" title="apple tree">', $result);

which doesn't work if apple does not follow the >.

I would try to do this by running through the text once and removing all the anchor tags, then running something like:
'/apple(?![^<]*>)/i'

references:
http://www.perl.com/doc/manual/html/pod/perlre.html
http://regex.larsolavtorvik.com/
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Pretty tough regex

Post by prometheuzz »

This should do it:

Code: Select all

$text = 'The big red apple was growing on an <a href="appletree.com" title="apple tree">Apple Tree</a>
<img src="apple.gif" title="apple tree">';
echo preg_replace('#apple(?![^<>]*(?:>|</a>))#i', 'REPLACEMENT', $text);
Post Reply