Page 1 of 1

RegEx help for links

Posted: Sat May 31, 2003 2:45 pm
by nerd
Hi All....

Hope someone can give me a hand here. I want to do a preg_replace on text as long as it isnt in middle of a hyperlink. Can anyone help on the regex for it?

Code: Select all

$message="some text here and <a href="#">more text</a>";
//replace all text not in middle of a hyperlink
$message=preg_replace('|\b(text)\b|i','replaced',$message);

echo $message;
If you cant do a variable length look behind to see if there is an <a href... then what can be done to ensure that there isnt a </a> before the next <a href...

TIA

*bump*

Posted: Tue Jun 03, 2003 9:26 am
by nerd
Still having probs with this.

whenever I do a preg_replace on the word 'Bob' in the following code:

Code: Select all

$message='Bob is neat as heck <a href="http://www.bob.com">Bobs</a>';

$message=preg_replace('|\b(bob?s)\b|i','Frank',$message); 

echo $message;
It ends up changing the link as well.

What can I do? I dont want to change anything in tags!

TIA

Posted: Tue Jun 03, 2003 9:30 am
by d1223m
what are yuo doing with all the text that isnt inside a href?

perhaps your are looking at the problem the wrong way round.

Posted: Tue Jun 03, 2003 10:33 am
by nerd
Thanks for the reply.

I have a routine that replaces all words in a list with a hyperlink. If a user types 'I like bob' in my forums, it replaces all instaces of Bob with a link that gives them more information on bob.
The problem resides in 2 areas: If bob is in a tag it replaces it... or if bob is ALREADY in a link... it replaces it.

I really need some help in figuring a way around this as it will be the foundation for sponsers on my site (i.e. they pay to be 'auto-linked') whenever their product or company name appears in a post.

Here is a previous thread that the function was pretty much written in (thanks Jason) -> viewtopic.php?t=9226

Any insight is deeply appreciated

Posted: Tue Jun 03, 2003 10:42 am
by patrikG
Actually, don't look what you don't want, look for what you want.
You want the <a href="abcdefgh.html">...</a> bit and the "..." replaced.

Hence, do it like this:

Code: Select all

<?php
$message='Bob is neat as heck <a href="http://www.bob.com">Bobs</a>';

$message=preg_replace('/(\<a href=".*"\>)(.*)(<\/a>)/i','\1Frank\3',$message);

echo "<xmp>$message</xmp>";
?>
That replaces any text between the <a href=...> and </a> tags with Frank.

"\1Frank\3" may look a bit odd, but \1 is the first result from the regEx, \3 the third. If you look at the search-pattern, you will see three set of brackets, each for one set of results. \2 is what you want replaced.

hmmm

Posted: Tue Jun 03, 2003 2:37 pm
by nerd
Thanks!

The only problem Im still having is that it kind of does the opposite of what I want.
I am replacing the words with a hyperlink.
So... I do not want to replace any of the words that are already linked... or any of the words within a link tag.

i.e.

Change the underlined versions of bob to <a href="bob.htm">bob</a>
My name is bob and the home page of bob is: <a href="www.bob.com">Bobs homepage</a> thanks... Bob
Hope this makes sense and I am not asking for the impossible!
I know it would be very costly... but your code has given me an idea... what if I were to do a preg_replace on the items I DONT want matched to make them not come up a match... i.e. stick a placeholder like <--ACK--> so the above sentence looks like:
My name is bob and the home page of bob is: <a href="www.b<--ACK-->ob.com">B<--ACK-->ob</a>
Then I could do my original preg_replace and then a str_replace to remove all <--ACK-->'s

I hope there is a better way as that sounds REALLY pants to me. Thats 3 scans for each word I want to replace!

Please help!!!

Posted: Tue Jun 03, 2003 3:16 pm
by ILoveJackDaniels
Welll, I've done this. And it's fairly nasty the way I did it. You need to look for the code you don't want to replace first, and edit that. The problem is that on a forum the only way to tell if the text you are replacing is inside a link or not is to check for the number of 'open link' (or code, quote or img) tags before the word, and the number of 'close link' (or the rest) before. If they don't match, the word is within a link.

You also need to check for words where they appear like bob.something.com or bob@something.com. Mark those with an 'a' just beforehand (so they become abob) and then do a simple preg_replace on all instances of bob where bob does not appear as part of a word. Then replace all 'abobs' with 'bob'. (I used 3 a's to eliminate the possibility of the thing accidentally matching a real word). Make sense?

I'd post my code if it wasn't such a poorly written pos. Maybe when I have it tidied up and working fast, I'll post it.

Posted: Tue Jun 03, 2003 6:14 pm
by patrikG
I hate to post something not ready, but I've gotta crack on.

Code: Select all

<?php
$message='Bob is neat as heck, the good Bob <a href="http://www.bob.com">Bobs</a> Bob drinks Guinness.';
echo "<xmp>$message</xmp>";
$message=preg_replace('/(Bobs?[^<\/a>|\.])/i','Frank \2',$message);

echo "<xmp>$message</xmp>";
?>
Input: "Bob is neat as heck, the good Bob <a href="http://www.bob.com">Bobs</a> Bob drinks Guinness."

Output: "Frank is neat as heck, the good Frank <a href="http://www.bob.com">Frank </a> Frank drinks Guinness."

I'll look at it again when I have more time - if no-one else comes up with the solution in the meantime.

Posted: Wed Jun 04, 2003 5:42 am
by patrikG
Well, here we go.

Code: Select all

<?php
$message='Bobs is neat as heck, the good Bob <a href="http://www.bob.com">Bobs</a> Bob drinks Guinness.';
echo "<xmp>$message</xmp>";
$message=preg_replace('/(Bobs? )/i','Frank \2',$message);

echo "<xmp>$message</xmp>";
?>
Input: Bobs is neat as heck, the good Bob <a href="http://www.bob.com">Bobs</a> Bob drinks Guinness.

Output: Frank is neat as heck, the good Frank <a href="http://www.bob.com">Bobs</a> Frank drinks Guinness.

Note, however that this regEx is looking for "Bob" with optional "s" followed by " ".