RegEX Code not working at all

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
bbentp
Forum Newbie
Posts: 7
Joined: Wed Nov 02, 2005 8:41 pm

RegEX Code not working at all

Post by bbentp »

Hello everyone,

I'm ok with the very simple regular expressions that I would nornally need to use to design a site with, but when it comes to imtermediate or advanced regEXs.. I get totally lost!

I made a script that pull from my mySQL database adn returns an article or news story. After pulling the PHP script compares the article/news body to a glossary I've setup for terms. I have it working perfectly to match the terms in the glossary... The only issue is it matches all matches whether it's in HTML Tags or not...

I need it so that it won't match any terms found that are in any HTML tags unless it's a '<span>' or '<strong>'...

I've listed the code below and appreciate any suggestions or corrections that can be provided...


/* Pull each glossary term from DB */

Code: Select all

$keywordsArray = array();
	 $queryA  = "SELECT * FROM glossary WHERE deviceGlossary != 'device' ORDER BY title DESC";
	 $resultA = mysql_query($queryA) or die("Died: ".mysql_error());

/* Run through each term */
	 
	 while ($rows = mysql_fetch_object($resultA)) {
	 
                 $keywordsArray[$rows->title] = $rows->id;
	 $addKeyword = preg_replace("/, /", ",", $rows->add_keywords);
	 $addKeywords = explode(",", $rows->add_keywords); 
	  for ($y=0; $y<count($addKeywords); $y++) {
	 	$keywordsArray[$addKeywords[$y]] = $rows->id;
	  }
		   	
	 }	

/* Terms loaded into array */
 
	 $foundMatches = array_combine($keywordsArray, $idsArray); 
	 $contents = explode(" ", $artMainContents);


/* Foreach term compare to article/news.. and replace in the body (article/news story) */
	 
	 foreach($keywordsArray as $title=>$titid) {
	  if ($title != "") {
      $artMainContents  = preg_replace('#\b('.preg_quote($title,'#').')\b#i',"<a href=\"javascript: nullVoid();\" onClick=\"javascript: showTerm('".$titid."', 'default');\" class=\"glossaryTerm\" title=\"Lookup term in the wireless glossary.\">\\1</a>",$artMainContents);
	  }
	 }
User avatar
shoebappa
Forum Contributor
Posts: 158
Joined: Mon Jul 11, 2005 9:14 pm
Location: Norfolk, VA

Post by shoebappa »

How bout this:

Code: Select all

$artMainContents  = preg_replace('#(<span[^>]*>|<strong[^>]*>)([^<]*)\b('.preg_quote($title,'#').')\b#i',"\\1\\2<a href=\"javascript: nullVoid();\" onClick=\"javascript: showTerm('".$titid."', 'default');\" class=\"glossaryTerm\" title=\"Lookup term in the wireless glossary.\">\\3</a>",$artMainContents);
Don't ask me why it's changing colons to entities, so if you copy the code, make sure to look for the &#058 instead of the colons after Javascript...

I hope that's what you mean, basically matches the title when it's within a span or strong. At least it should... It wouldn't work if there are nested tags in between.

So it would match <span attributes>this is a title or <strong>this is a title, but not, <span attributes>this is a <em>title</em>, or even <span attributes><em>this is</em>a title. I just have it saying not a < so as soon as there is one, it stops. I guess you could say not an ending tag, or even not an ending span or strong tag. Note that since it's matching that stuff you need to put it back when it's replaced, so I added the \\1\\2 to the fron of the anchor.

I think there's a way to say see if it's there but ignore it, but I haven't used that yet. Someone on here the other day brought up "look behinds" and I think they would be ideal here if I knew how to use them : )

Side note, the PHP tags instead of just Code will highlight the php code enclosed, to make it easier to read.
User avatar
Burrito
Spockulator
Posts: 4715
Joined: Wed Feb 04, 2004 8:15 pm
Location: Eden, Utah

Post by Burrito »

Moved to Regex
bbentp
Forum Newbie
Posts: 7
Joined: Wed Nov 02, 2005 8:41 pm

Well..

Post by bbentp »

That helped my mental state of RegEX, in terms of being a good lesson of matching tags, but as for the script it basically does the same exact thing as it did before... Basically I want it to match the terms.. as long as they're not in a <a>|<img> tag..

I included my link to the current page in question for a better idea of what I'm talking about. Look at the page and scroll down and you'll notice where the image should be, but since it matches 'Cingular' as a term if changes the code and this is what I'm trying to avoid..


http://www.umts-hsdpa.com/index.php/New ... r.3G.Users

If you look at the middle of the page you see this line:

Code: Select all

cingular.cingular.net/mycingular/SupportingFiles/GIF/hbo_mobile_logo_s.gif" border="0" alt="HBO Mobile" />
Which should be and image tag like this:

Code: Select all

<img src="http://cingular.cingular.net/mycingular/SupportingFiles/GIF/hbo_mobile_logo_s.gif" border="0" alt="HBO Mobile" />
That's the major problem that I'm running into. The whole article is in a <span>, I need to match anything but a <img>|<a>, so as not to interrupt any images or links already designated that may contain those keywords.

Thank you again for you assistance!!!
Post Reply