Page 1 of 1
link swaping
Posted: Wed Jul 21, 2010 9:39 am
by xionhack
Hello. I have a questions that I think its pretty much about regular expresions. I have this code:
Code: Select all
function create_link($matches) {
// link_id should be an auto-increment field
$insert = mysql_query(sprintf("INSERT INTO links ( link_url ) VALUES ( '%s' )", $matches[1]));
return '<a href="' . mysql_insert_id() .'">'. $matches[2] .'</a>';
}
// File name, could be a url
$file = 'file.html';
if (file_exists($file)) {
$content = file_get_contents($file);
// Regex replace string
$pattern = '/<a href="([^"]+)">([^<]+)<\/a>/s';
$content = preg_replace_callback($pattern, 'create_link', $content);
echo $content;
}
What that does, is that it checks for all the links in a page, saves the links in a database and then swap the link address with the id of that link in the database. The problem that i have is that the code works when the link is just "<a href="" ></a>" but if the <a> tag has an attribute, then it wont work. i.e <a href="" id=""></a> OR <a id="" title="" href=""></a>
Re: link swaping
Posted: Wed Jul 21, 2010 2:41 pm
by ridgerunner
To fix your immediate problem, try this pattern...
Code: Select all
$pattern = '%<a (?:(?!href)[^>])*+href="([^"]+)"[^>]*+>([^<]+)</a>%';
However, your regex has a couple other problems too:
- It does not work if the A tag contains another tag (e.g. an IMG tag).
- It does not work if there is whitespace surrounding the equals sign (i.e. href = "xxx").
Here is an improved pattern which corrects these deficiencies:
Code: Select all
$pattern = '%<a (?:(?!href)[^>])*+href\s*+=\s*+"([^"]+)"[^>]*+>([^<]*+(?:(?!</a>)<[^<]*+)*+)</a>%';
Hope this helps!

Re: link swaping
Posted: Thu Jul 22, 2010 9:05 am
by xionhack
Hi. I was trying this code out, but the only problem that it has is that it's not copying the attributes after or before the href attribute. Also I dont want to replace when after the href is mailto. I'm trying this code but it's still not working perfectly either:
Code: Select all
function create_link($matches) {
// link_id should be an auto-increment field
$insert = mysql_query(sprintf("INSERT INTO links ( link_url ) VALUES ( '%s' )", $matches[1]));
return '<a' . $matches[1].'href="' . mysql_insert_id() .'"' . $matches[3] . '>' . $matches[4] .'</a>';
}
// File name, could be a url
$file = 'file.html';
if (file_exists($file)) {
$content = file_get_contents($file);
// Regex replace string
$pattern = '/<a([^>]*)href="([^"]+)"([^>]*)>([^<]+)<\/a>/s';
$content = preg_replace_callback($pattern, 'create_link', $content);
echo $content;
}
Re: link swaping
Posted: Thu Jul 22, 2010 3:10 pm
by ridgerunner
Your sql substitution is using $matches[1] when it should be $matches[2].
Try this out...
Code: Select all
function create_link($matches) {
// link_id should be an auto-increment field
$insert = mysql_query(sprintf("INSERT INTO links ( link_url ) VALUES ( '%s' )", $matches[2]));
return '<a' . $matches[1].'href="' . mysql_insert_id() .'"' . $matches[3] . '>' . $matches[4] .'</a>';
}
// File name, could be a url
$file = 'file.html';
if (file_exists($file)) {
$content = file_get_contents($file);
// Regex replace string
$pattern = '%
<a\b # A opening tag opening delimiter
([^>]*?) # Capture group 1 = pre-href attributes
href\s*+=\s*+ # match href attribute, = and any whitespace
" # opening quote delimiter
(?!mailto:) # make sure the link is not a mailto:
([^"]++) # Capture group 2 = the non-mailto link
" # closing quote delimiter
([^>]*+) # Capture group 3 = post-href attributes
> # A opening tag closing delimiter
( # Capture group 4 = A tag contents
[^<]*+ # zero or more non <
(?: # Use "Unrolling-the-loop" efficiency technique
(?!</a>)< # if this < is not the A closing tag, match the <
[^<]*+ # then keep going until next <
)*+ # keep looping until the </a> is found
) # End capture group 4 = A tag contents
</a> # A closing tag
%xi';
$content = preg_replace_callback($pattern, 'create_link', $content);
echo $content;
}