Page 1 of 1

Turning certain phrases in a block of text into links!

Posted: Fri Feb 03, 2006 1:57 pm
by CobraCards
I run a website selling trading card games (Magic: The Gathering, Lord of the Rings TCG, etc.), and recently added a strategy section to the site where people can post their deck ideas and such. I would like to set my strategy section up so that all card names display as links to the detail page for that particular card.

For example, in a line of text like this...

"My favorite MTG card is Boros Swiftblade because he works so well with my Sunforger deck."

... the phrase "Boros Swiftblade" and the word "Sunforger" would be recognized from my database of card names, and would be turned into appropriate links.



Seems like a simple str_replace will do the trick, but I can't think how to deal with overlapping card names! How would I get the code to recognize the fullest possible card name ("Karplusan Forest") and NOT recognize the smaller card name ("Forest")?

Also, will I need to worry about load times slowing down? What sort of speed can I expect if I run a str_replace on around 5,000 different phrases? :lol:

Thanks!

Posted: Fri Feb 03, 2006 2:38 pm
by wtf
you could do str_replace at the data entry time and save html to db that way you don't have to worry about parsing it later.

Not sure about 1st question. I'm curious myself.

Posted: Fri Feb 03, 2006 5:17 pm
by CobraCards
Good idea, although in my particular case that would also create maintenance issues. I've got the parse time down to 1 second, which seems reasonable.

Still struggling with the overlapping name problem though. The simplest fix I can think of is to check names from longest to shortest, and to check only non-linked text


For example:
"I prefer Karplusan Forests for a more solid mana base."

Look for "Karplusan Forest". Found it -- make it a link.

Look for "Forest". Since we're looking only in the non-linked text, it's not found, and nothing is changed.


Now, the only problem is that I don't know how to order an array by length... or how to skip over the linked text. :lol:

Posted: Sat Feb 04, 2006 4:50 pm
by wtf
How about if you url encode text to be raplaced. That would give you one unique string regardless of how many words it is composed of.

Posted: Sat Feb 04, 2006 6:22 pm
by CobraCards
Sorry if I'm being dense, but... how does that relate to my problem? 8)

Posted: Sat Feb 04, 2006 9:00 pm
by josh
Ok there are two ways I would say are the best for your situation.

One is to allow the user to link the text themselves by selecting it and clicking a button at entry time, kind of like a bbcode type of thing.


The second would be to run a cron job that re-scans all the data, this way when new words are entered they can be turned into links in other documents.


Now there are issues with both ways, here it goes:


First method
- words are kept in context, "walk into the forest", forest would not be linked to a card named "forest" unless the user explicitly does so.
- when new cards are added the text will need to be updated

second method
- you need to store a raw copy of the text, and an already parsed version of the text. The raw text will be parsed and over-written onto the already parsed version every 24 hours or so.
- This allows you to automatically link text and avoid parsing at run-time.


As for your technicalities of matching the largest word you need to sort your keywords by their length. Alias the length of the string in your mysql query and order by it

something like:

Code: Select all

select `name`, CHAR_LENGTH(`name`) as `length` FROM `cards` ORDER BY `length`DESC
You will always hit the longer string first using this method.

Posted: Sat Feb 04, 2006 9:44 pm
by CobraCards
I think I'd prefer to keep this an automatic thing, and to have it happen immediately so that people see the linked version right after it's posted -- but I will definitely keep those other ideas in mind, thanks!

I now have the DB query sorting cardnames by length, but that doesn't completely solve the problem. For example, the phrase "Legolas, Fearless Marksman" is a card title, and the word "Fear" within that is also a card title. So the code parses the string and makes the whole phrase a link, and then parses it again and makes "Fear" another link. The resulting HTML output is something like this:

<a href="legolas-fearless-marksman.html">Legolas, <a href="fear.html">Fear</a>less Marksman</a>

How to fix this? I was thinking it should be possible to prevent the code from parsing text with <a></a> tags, but how would this be accomplished?

Posted: Sat Feb 04, 2006 9:49 pm
by josh
Regular expressions, simply do not update code that has already been parsed. I think if you pass your replacements to str_replace as an array it will not overwrite itself either, not sure. Just know if you parse it live like this your site is going to perform horribly under a heavy load. How often are new entries added? Would updating the parsed text everytime a new card is entered be a compromise?

Also you should store the raw version of the text regardless of the method so you can revert, etc.. painlessly

Posted: Sat Feb 04, 2006 10:48 pm
by CobraCards
I think I might actually go back to the idea of doing the replacement when the article is submitted. Having a few older articles that aren't properly linked (or having to resubmit them) seems the least of the available evils. :lol:

Will definitely have to store the raw text somewhere when I make that switch, thanks for the tip!



Just to make sure we're on the same page about that overlapping link thing, the code I'm using basically does this:

1) grab the cardname list from the DB, ordered by length descending

2) look for the first cardname in $message; if it's found, change it to a link

3) repeat step 2 for each cardname in the list

4) spit out $message!



So, the message IS updated after each change...

Posted: Sun Feb 05, 2006 4:58 am
by eyespark
Well, what about this way + for loop or something in that direction? $s1 and $s2 should be something unique, ofcourse, not 1 and 2, and you will have to go from longer to shorter names.

Code: Select all

<?
$text = "My favorite MTG card is Boros Swiftblade because he works so well with my Boros deck. My favorite MTG card is Boros Swiftblade because he works so well with my Boros deck. My favorite MTG card is Boros Swiftblade because he works so well with my Boros deck. My favorite MTG card is Boros Swiftblade because he works so well with my Boros deck.";

$string1 = "Boros Swiftblade";
$s1 = "1";
$string11 = "<a href ='xxx.html' >Boros Swiftblade</a>";
$output = str_replace($string1, $s1, $text);

$string2 = "Boros";
$s2 = "2";
$string22 = "<a href ='yyy.html' >Boros</a>";
$output = str_replace($string2, $s2, $output);

$search = array ('@1@','@2@',);
$replace = array ($string11, $string22,);
$output = preg_replace($search, $replace, $output);


echo $output;
?>

Posted: Sun Feb 05, 2006 11:47 am
by wtf
How about if you url encode text to be raplaced. That would give you one unique string regardless of how many words it is composed of.
For example, list of words in your db would look like this


Karplusan%20Forest


at the data enry time, you grab card name and replace space with %20 or for that matter any character would work
then you just replace that character with space when outputing to page.

Posted: Sun Feb 05, 2006 12:19 pm
by CobraCards
Jcart | Please use

Code: Select all

and

Code: Select all

tags where appropriate when posting code. Read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url][/color]


All right, here's the code I'm using now:

Code: Select all

$links_cardlist = mysql_query(" SELECT products_name, products_id FROM products_description WHERE products_name NOT LIKE '%FOIL)%' AND products_name <> 'Forest' AND products_name <> 'Island' AND products_name <> 'Mountain' AND products_name <> 'Plains' AND products_name <> 'Swamp' ORDER BY LENGTH(products_name) DESC");

$counter = 0;

while($row = mysql_fetch_array($links_cardlist)) {
  $before = str_replace('&#39', '\'', strip_tags($row['products_name']));
  $before = str_replace('&#34', '\"', $before);

  $after = '<a href="http://cobracards.com/' . strip($before) . '-image-' . $row['products_id'] . '.html" class="postlink" target="_blank" onClick="window.open(\'http://cobracards.com/' . strip($before) . '-image-' . $row['products_id'] . '.html\', \'\', config=\'height=400, width=550, left=30, top=30, toolbar=yes\'); return false;">' . $before . '</a>';

  $placeholder = 'qZb' . $counter . 'hK';
  $message = str_replace($before, $placeholder, $message);

  $placeholder_array[$counter] = $placeholder;
  $link_array[$counter] = $after;
  $counter++;
}

  $message = str_replace($placeholder_array, $link_array, $message);

So, we're turning each matching phrase into a unique placeholder, and when that's done, replacing all the placeholders with that nasty-looking link code.

Legolas, Fearless Marksman --> qZb341hK --> <a href="blahblahblah">Legolas, Fearless Marksman</a>

Works nicely on my test page, thanks for the tip! 8)


Jcart | Please use

Code: Select all

and

Code: Select all

tags where appropriate when posting code. Read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url][/color]

Posted: Sun Feb 05, 2006 12:23 pm
by CobraCards
Oops, have been sitting on this page for a while, didn't see wtf's last post.

That will work in some cases, but not in all. Even if "Karplusan Forest" is changed to "Karplusan%20Forest", it will still match "Forest" as well. (Likewise, "Legolas%20Fearless%20Marksman" still matches "Fear".)