Page 1 of 1

link swaping

Posted: Wed Jul 21, 2010 9:39 am
by xionhack
Hello. I have a questions that I think its pretty much about regular expresions. I have this code:

Code: Select all

function create_link($matches) {
  // link_id should be an auto-increment field
  $insert = mysql_query(sprintf("INSERT INTO links ( link_url ) VALUES ( '%s' )", $matches[1]));
  
  return '<a href="' . mysql_insert_id() .'">'. $matches[2] .'</a>';
}

// File name, could be a url
$file = 'file.html';

if (file_exists($file)) {
  $content = file_get_contents($file);
  
  // Regex replace string
  $pattern = '/<a href="([^"]+)">([^<]+)<\/a>/s';
  $content = preg_replace_callback($pattern, 'create_link', $content);

  echo $content;
}
What that does, is that it checks for all the links in a page, saves the links in a database and then swap the link address with the id of that link in the database. The problem that i have is that the code works when the link is just "<a href="" ></a>" but if the <a> tag has an attribute, then it wont work. i.e <a href="" id=""></a> OR <a id="" title="" href=""></a>

Re: link swaping

Posted: Wed Jul 21, 2010 2:41 pm
by ridgerunner
To fix your immediate problem, try this pattern...

Code: Select all

  $pattern = '%<a (?:(?!href)[^>])*+href="([^"]+)"[^>]*+>([^<]+)</a>%';
However, your regex has a couple other problems too:
  • It does not work if the A tag contains another tag (e.g. an IMG tag).
  • It does not work if there is whitespace surrounding the equals sign (i.e. href = "xxx").
Here is an improved pattern which corrects these deficiencies:

Code: Select all

  $pattern = '%<a (?:(?!href)[^>])*+href\s*+=\s*+"([^"]+)"[^>]*+>([^<]*+(?:(?!</a>)<[^<]*+)*+)</a>%';
Hope this helps! :)

Re: link swaping

Posted: Thu Jul 22, 2010 9:05 am
by xionhack
Hi. I was trying this code out, but the only problem that it has is that it's not copying the attributes after or before the href attribute. Also I dont want to replace when after the href is mailto. I'm trying this code but it's still not working perfectly either:

Code: Select all

function create_link($matches) {
  // link_id should be an auto-increment field
  $insert = mysql_query(sprintf("INSERT INTO links ( link_url ) VALUES ( '%s' )", $matches[1]));
  
  return '<a' . $matches[1].'href="' . mysql_insert_id() .'"' . $matches[3] . '>' . $matches[4] .'</a>';
}

// File name, could be a url
$file = 'file.html';
if (file_exists($file)) {
  $content = file_get_contents($file);
  
  // Regex replace string
  $pattern = '/<a([^>]*)href="([^"]+)"([^>]*)>([^<]+)<\/a>/s';
  $content = preg_replace_callback($pattern, 'create_link', $content);

  echo $content;
}

Re: link swaping

Posted: Thu Jul 22, 2010 3:10 pm
by ridgerunner
Your sql substitution is using $matches[1] when it should be $matches[2].
Try this out...

Code: Select all

function create_link($matches) {
  // link_id should be an auto-increment field
  $insert = mysql_query(sprintf("INSERT INTO links ( link_url ) VALUES ( '%s' )", $matches[2]));

  return '<a' . $matches[1].'href="' . mysql_insert_id() .'"' . $matches[3] . '>' . $matches[4] .'</a>';
}

// File name, could be a url
$file = 'file.html';
if (file_exists($file)) {
  $content = file_get_contents($file);

  // Regex replace string
  $pattern = '%
    <a\b               # A opening tag opening delimiter
    ([^>]*?)           # Capture group 1 = pre-href attributes
    href\s*+=\s*+      # match href attribute, = and any whitespace
    "                  # opening quote delimiter
    (?!mailto:)        # make sure the link is not a mailto:
    ([^"]++)           # Capture group 2 = the non-mailto link
    "                  # closing quote delimiter
    ([^>]*+)           # Capture group 3 = post-href attributes
    >                  # A opening tag closing delimiter
    (                  # Capture group 4 = A tag contents
      [^<]*+           # zero or more non <
      (?:              # Use "Unrolling-the-loop" efficiency technique
        (?!</a>)<      # if this < is not the A closing tag, match the <
        [^<]*+         # then keep going until next <
      )*+              # keep looping until the </a> is found
    )                  # End capture group 4 = A tag contents
    </a>               # A closing tag
    %xi';
  $content = preg_replace_callback($pattern, 'create_link', $content);

  echo $content;
}