Page 1 of 1

Modifying all specific links on a page

Posted: Mon Mar 17, 2008 8:53 am
by Walid
Hi all,

I'm new to the forum and have entered quite selfishly in the hope that someone can help.

As part of a site redesign, we are focusing a lot on usability & accessibility and one of the measures which we are taking is to provide more detailed information on external links (i.e. href).

So far, I've put together a javascript (jquery) function which does the following:

1. Goes through every link on a page and identifies the external ones.
2. Extracts the domain name and adds a title attribute with the value "External Link - http://www.domainname.com".
3. Adds an "external link" image at the end of the link using "external link" as the alt text.

I would now like to port this script over into PHP as there really is no need for the client machine to do this if our servers can. After all, it will offer them a faster download and what happens if they have javascript switched off.

If anyone can help me with this, I would appreciate it very much.

(I'm now off into the forums to see if I can help anyone else and earn some good karma.)

Re: Modifying all specific links on a page

Posted: Mon Mar 17, 2008 9:55 am
by scriptah
If you want to do the easy way, you could enable output buffer.
I wrote a simple skeleton so you can understand the idea.

Make sure to rewrite every single function on the snippet, as it won't work at all on a real production application.
The regexps must me re-written.

Code: Select all

 
<?php
function getDomainFromLink( $url )
{
    /**
     * Needs to be rewritten to work on a real world example;
     */
    $pattern = "/href=\"http:\/\/(.*)\/\"/";
    $matches = array( );
    if( preg_match( $pattern, $url, $matches ) )
    {
        return $matches[1];
    }
    return null;
}
 
function isDomainLocal( $domain )
{
    return !strcmp( $domain, "mydomain.com" );
}
 
function isLink( $line )
{
    /**
     * Needs to be rewritten to work on a real world example;
     */
    $pattern = "/^<a.*>.*<\/a>/";
    return preg_match( $pattern, $line );
}
function fixLinks( $buffer, &$final_buffer )
{
    ob_end_clean( );
    $lines = explode( "\r\n", $buffer );
    
    foreach( $lines as $line )
    {
        if( empty( $line ) ) { continue; }
        
        if( isLink( $line ) )
        {
            $domain = getDomainFromLink( $line );
            if( !isDomainLocal( $domain ) )
            {
                /**
                 * Here some action on the line itself would be taken;
                 * And in the end, insert the line into the final buffer;
                 */
                
            }
            else
            {
                $final_buffer .= $line;
            }
        }
        else 
        {
            $final_buffer .= $line;
        }
    }
}
?>
 
<?php
 
ob_start( );
echo '<a href="http://domain.com/">External Link #1</a>' . "\r\n";
echo '<a href="http://mydomain.com/">Internal Link #1</a>' . "\r\n";
echo '<a href="http://domain2.com/">External Link #2</a>' . "\r\n";
echo '<b>Not a lin</b>' . "\r\n";
 
$final_buffer = "";
fixLinks( ob_get_contents(), $final_buffer );
 
/**
 * output the final buffer
 */
?>
 
First I enable output buffering, then I get the contents of the buffer, send them to a function that performs some action on the buffer( in our case fix the links ... ), in the end I output the final buffer with the corrected links.

Re: Modifying all specific links on a page

Posted: Tue Mar 18, 2008 1:20 am
by Walid
THanks.

I'm looking into it now.

Re: Modifying all specific links on a page

Posted: Tue Mar 18, 2008 1:25 am
by Walid
Does your solution expect each link to be contained on its own line? Or have i misunderstood.

Also, with regards to the sections that say "Needs to be rewritten to work on a real world example", can we imagine that we're google.com.

Thanks

Re: Modifying all specific links on a page

Posted: Tue Mar 18, 2008 6:55 am
by scriptah
My code makes an assumption that there's one link only per line.
Your best shot is to read the whole line from the source code, extract all the links on that line ( preg_match_all ...).
The logic is there, and it's pretty basic, try writing some code, and if you get stuck post and we'll try to help.

Re: Modifying all specific links on a page

Posted: Tue Mar 18, 2008 7:44 am
by Walid
In the end, I borrowed the regexp from somewhere else (lost the link) and came up with this:

Code: Select all

 
function FixLinks($matches)
{
    list($EntireLink, $Href, $LinkText) = $matches;
    //magic goes here
}
 
$pattern = '/<a\s+.*?href=[\"\']?([^\"\' >]*)[\"\']?[^>]*>(.*?)<\/a>/i';
$buffer = preg_replace_callback($pattern, "FixLinks", $buffer);
 
Things have worked quite well. Thanks.

Any comments on the above?

Re: Modifying all specific links on a page

Posted: Tue Mar 18, 2008 11:53 am
by scriptah
Looking good :wink: