Regular Expression question

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Rayearth
Forum Newbie
Posts: 6
Joined: Mon Jun 09, 2003 10:29 pm

Regular Expression question

Post by Rayearth »

Hi there

I'm trying to write a regular expression to match a string that isn't part of a link. For example, if I want to match the string "red" and I have the following to haystack:

Code: Select all

red is <a href=red.html>fred's</a> friend
Only the red at the beginning of the line should match since the red in "red.html" and the red in "fred's" are part of a link.

I've tried playing around with assertions using preg_replace (I'm trying to replace the matched text with something else), but haven't had much success - never really got the hang on regular expressions.

Any help would be greatly appreciated!

Thanks in advance.
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

Have a look at viewtopic.php?p=42465 - it is very related, but not quite the same thing.

What I find very useful when doing regEx is, firstly, be very clear about what you want, then, secondly, how it differs from what you do not (damn sounds just like real life).

Edit: P.S.: This one viewtopic.php?p=42796 is quite similar, have a look :)
m3rajk
DevNet Resident
Posts: 1191
Joined: Mon Jun 02, 2003 3:37 pm

Post by m3rajk »

there's a way to use the perl regular expression: preg_match('/pattern/', string);

this is useful because you can set \b\b in perl to make it boundries so preg_match('/\bred\b/', $string); will only pick up that first red in that line when line is $string
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

\b in perl to make it boundries so preg_match('/\bred\b/', $string); will only pick up that first red in that line when line is $string
\b interprets a "." (dot) as a full-stop and thus a word-boundary. Hence \b won't help you much when dealing with urls.
m3rajk
DevNet Resident
Posts: 1191
Joined: Mon Jun 02, 2003 3:37 pm

Post by m3rajk »

i didn't realize that.
thanks for the pointer patrick.. umm.. a question though, if you're online, could you look at my thread: viewtopic.php?t=9535 ? i've got an issue: the image creation is NOT working for jpgs and IS for pngs. i don't understand why
aniruddha
Forum Newbie
Posts: 3
Joined: Mon Jun 09, 2003 12:52 am
Location: Mumbai, India
Contact:

Try this

Post by aniruddha »

Hi all,

Just try the following. It worked for me.
-> First remove all the links from the string and then find for "red".

-> $string = ereg_replace("<a[ ]+href[ ]*=[ ]* \"?.*\"?>.*<\s*/a\s*>", "", $hay_stack);

-> ereg("red", $string);

Aniruddha Deshpande
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

umm.. a question though, if you're online, could you look at my thread: viewtopic.php?t=9535 ? i've got an issue: the image creation is NOT working for jpgs and IS for pngs. i don't understand why
I know a little about thumbnail-creation... I've modified this class from phpclasses.org for that, which does the job quite neatly. I think he's using GD2...
m3rajk
DevNet Resident
Posts: 1191
Joined: Mon Jun 02, 2003 3:37 pm

Post by m3rajk »

patrikG wrote:
umm.. a question though, if you're online, could you look at my thread: viewtopic.php?t=9535 ? i've got an issue: the image creation is NOT working for jpgs and IS for pngs. i don't understand why
I know a little about thumbnail-creation... I've modified this class from phpclasses.org for that, which does the job quite neatly. I think he's using GD2...
i looked over the oreilly book. either something's wrong with my copy or they have a typo. they have ImageCreateFromJPG
the online manulal they point to for graphics doesn't have that, but has ImageCreateFromJPEG

i add the e and it works fine.
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

Excellent. Tell O'Reilly about the typo - I am sure they'll appreciate that :)

P.S.: My typo above - "I know little (not "a" little) about thumbnail-creation..." :wink:
Rayearth
Forum Newbie
Posts: 6
Joined: Mon Jun 09, 2003 10:29 pm

Post by Rayearth »

Thanks for all the input so far everyone ^^

I've looked at the suggestions and other threads patrikG posted and they indeed look similar to my problem. The only solution I can think of after reading the other threads is the "ugly" solution of reading everything before it and counting how many open tags there are and comparing it to the close tags. I've actually thought about that before, but I can't think of a regular expression that would do the job - I'm rather relunctant to write a whole set of functions to do it.

With this, I need to ask "is it possible to count the number of matches on the first part of a regular expression then compare it with the second part and do a replace if the first is greater than the second"?

So something like:

$data=preg_replace('/(<a)+.*?(text_to_replace).*(<\/a)+/i','replace_with_this'.' \2',$data);

Where I'd need to compare the count of the stuff in red to the stuff in green and make the replace only if the red is greater than the green.

As for aniruddha's suggestion, it wouldn't work as I am trying to replace the text it matches.

Thanks for all the feedback so far!
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

Does this

Code: Select all

<?php
$message="red is <a href=red.html>fred's</a> friend. The good old red.";
echo "<xmp>$message</xmp>";
$message=preg_replace('/(red | red)\.?/iU',' Frank \2',$message);

echo "<xmp>$message</xmp>";
?>
do the job or is it not inclusive enough?

Input: red is <a href=red.html>fred's</a> friend. The good old red.
Output: Frank is <a href=red.html>fred's</a> friend. The good old Frank .
Rayearth
Forum Newbie
Posts: 6
Joined: Mon Jun 09, 2003 10:29 pm

Post by Rayearth »

It still doesn't exactly get the job done... I guess it's my fault since I didn't explain my situation clearly. I'm trying to search for keywords on a page and replace those keywords with a link instead. So consider this case:

Code: Select all

<html>
<head>
<title>red is good</title>
</head>
<body>
Replace this red but not <a href=red.html>this red or the one in the title</a>.
</body>
</html>
With your suggestion, it would replace the red inside the <title> tag as well and that's not good - that's why I'm so determined on counting and comparing the number of opening and closing anchors (<a> and </a>) before each match it finds.

Once again, thanks for all the replies!
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

What's the regEx you have so far?
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

How does this come along then?

Code: Select all

<?php
$message="<html>
<head>
<title>red is good</title>
</head>
<body>
Replace this red but not <a href=red.html>this red or the one in the title</a>.
</body>
</html> ";
echo "<xmp>$message</xmp>";
$message=preg_replace('/([^<.+>=])(red\.?)(.*<\/a>)?/i',' Frank\3',$message);
echo "<xmp>$message</xmp>";
?>
Rayearth
Forum Newbie
Posts: 6
Joined: Mon Jun 09, 2003 10:29 pm

Post by Rayearth »

Still doesn't exactly work... I think it's mainly because the example I gave you doesn't cover all the cases that's possible... Here's an example of what I'm trying to do:

http://www.neorayearth.net/dimension4/m ... manga1.php

The plain HTML code that's dumped at the beginning of the page is the text the regEx needs to go through to find and replace things with.

I'm using the regEx code:

Code: Select all

for ($i=0; $i<count($keys); $i++) &#123;
$searchstr = "<a href=".$pre.$urls&#1111;$i].$pos.">".$keys&#1111;$i]."</a>";
$data = preg_replace('/(&#1111;^<.+>=])('.$keys&#1111;$i].'\.?)(.*<\/a>)?/i',' '.$searchstr.'\3',$data);
&#125;
I have this a series of words I want to replace, hence I got a for loop going through the $keys array which keeps all the keywords. The $urls array keeps a list of URLs for each keyword (the index matches the $keys index) whereas $pre and $pos are javascript/paths and brackets/quotes that I use to open those URLs with.

The list of stuff that I need replacing is over a hundred entries, so I'll just include a few:

Code: Select all

"Hikaru's", "hikaru.html"
"Hikaru", "hikaru.html"
"Umi's", "umi.html"
"Umi", "umi.html"
"Fuu's", "fuu.html"
"Fuu", "fuu.html"
"Princess Emeraude's", "emeraude.html"
"Princess Emeraude", "emeraude.html"
"Emeraude's", "emeraude.html"
"Emeraude", "emeraude.html"
"Lantis'", "lantis.html"
"Lantis", "lantis.html"
As you can see, I have a pair for each keyword due to the "'s" of all these keywords (they're all names) and sometimes even 2 pairs (such as the case of "Emeraude" since she is referred to with and without the "Princess" in front) and this has has somewhat complicated the problem... I've considered moving the "'s" clone out of the database and have just add an extra condition to match "'s" in the regEx, but this would generate another problem with cases like "Lantis" where there is no "s" after the '.

Currently, $pre and $pos are as follows, but they'll change depending what type of link it is:

Code: Select all

$pre = ""javascript:CharacterWindow('http://www.neorayearth.net/dimension4/mkr/char/"
$pos = "')""
I hope I'm not trying to do something impossible ~_~
Post Reply