djstaz0ne wrote:... The Problem:
The first occurrence of "Salesforce Integration" does not get hyperlinked..
Actually, the first occurrence of "Salesforce Integration"
does get hyperlinked (The one following the <h1> tag). After running your code this is the result string I get (with the "missing links" highlighted in red):
Code: Select all
<h1><a href="/http://example.com">Salesforce Integration</a> and <a href="/http://example.com">API Integration</a> in New York</h1>
<p>Perpetual Technologies Unltd. specializes in <a href="/http://example.com">api integration</a> and [color=#FF0000][b]Salesforce Integration[/b][/color] in New York City. We have extensive experience in integrating CRM Systems with additional corporate data, utilizing <a href="/http://example.com">web services</a> APIs - working with SOAP and PHP. We can connect your "online work requests" to salesforce, automatically storing them in a database, and automatically generating and emailing "job tickets", listing all relevant project-related information. <a href="/http://example.com">Salesforce Integration</a> will help your company operate more smoothly.</p>
Here is what your regex is saying:
Following any HTML opening or closing tag (other than an opening anchor tag), find the last occurance of the keyword prior to any left angle bracket and capture it in group 2.
And it is doing exactly what you are asking it to do! To illustrate the problem, lets change the sub-expression right before the keyword to use a lazy rather than greedy quantifier, by adding the '?' ungreedy modifier like so:
Code: Select all
for($i=0;$i<count($keywordz);$i++){
$pattern = '!(<[^a][^>]*>[^<]*?)('.$keywordz[$i].')!i';
$replacement = '$1<a href="/'.$url.'">$2</a>';
$text = preg_replace($pattern, $replacement, $text);
}
When you run this regex on your test data, you now match only the first occurance of the keyword like so:
Code: Select all
<h1><a href="/http://example.com">Salesforce Integration</a> and <a href="/http://example.com">API Integration</a> in New York</h1>
<p>Perpetual Technologies Unltd. specializes in <a href="/http://example.com">api integration</a> and <a href="/http://example.com">Salesforce Integration</a> in New York City. We have extensive experience in integrating CRM Systems with additional corporate data, utilizing <a href="/http://example.com">web services</a> APIs - working with SOAP and PHP. We can connect your "online work requests" to salesforce, automatically storing them in a database, and automatically generating and emailing "job tickets", listing all relevant project-related information. [color=#FF0000][b]Salesforce Integration[/b][/color] will help your company operate more smoothly.</p>
It appears that your intent is to add hyperlinks to all keywords that have not already been linkified, but this is obviously not what this regex is doing. In order to make sure that you do not add a hyperlink inside another hyperlink, you need to match whole hyperlinks and skip doing any repolacement inside them. This can be accomplished using a modified regex and the preg_replace_callback function like so:
Code: Select all
<?php // test.php version 2010-03-19
$text = '<h1>Salesforce Integration and API Integration in New York</h1>
<p>Perpetual Technologies Unltd. specializes in api integration and Salesforce Integration in New York City. We have extensive experience in integrating CRM Systems with additional corporate data, utilizing web services APIs - working with SOAP and PHP. We can connect your "online work requests" to salesforce, automatically storing them in a database, and automatically generating and emailing "job tickets", listing all relevant project-related information. Salesforce Integration will help your company operate more smoothly.</p>';
$keywordz = array(
"api integration salesforce integration", // order of this array is important
"salesforce integration",
"api integration",
"web services");
$url = 'http://example.com';
for ($i = 0; $i < count($keywordz); $i++) {
$pattern = '!(<a\b[^>]*>.*?</a>)|('.$keywordz[$i].')!i';
$text = preg_replace_callback($pattern, 're_callback', $text);
}
function re_callback($matches) {
global $url;
if ($matches[1]) { // Case 1: this is a <a..>...</a>
return $matches[1]; // return it unmodified
}
elseif ($matches[2]) { // Case 2: a non-linked keyword
return '<a href="/'.$url.'">'.$matches[2].'</a>';
}
exit("Error!"); // never get here
}
file_put_contents('out.txt', $text);
?>
Hope this helps!
