PHP Developers Network

A community of PHP developers offering assistance, advice, discussion, and friendship.
 
Loading
It is currently Wed Dec 13, 2017 6:19 am

All times are UTC - 5 hours




Post new topic Reply to topic  [ 9 posts ] 
Author Message
PostPosted: Sat Oct 07, 2006 9:17 pm 
Offline
Tranquility In Moderation
User avatar

Joined: Sun Feb 06, 2005 8:18 pm
Posts: 5001
Location: Indiana
This short class will turn names/titles into safe, keyword rich URLs for use with Apache's mod rewrite. Could be useful for blogs or forums.

Usage Examples

- Before - showforum.php?forumid=21
- After - forums/21/world-news-and-current-events/index.php

- Before - showthread.php?threadid=22107
- After - forums/view-topic/22107/yet-another-school-shooting.php

Sample Apache mod rewrite Rule

Syntax: [ Download ] [ Hide ]
 
RewriteEngine On
RewriteRule ^forums/([0-9]+)/.+/index.html$ /forum.php?forumid=$1
 


I have adjusted this code and taken into considerations all posts, so this is the freshest code.

The Code
Syntax: [ Download ] [ Hide ]
 
<?php
 
/*
* This short class will turn user entered titles into URLs
* that are keyword rich and human readable.  For use with
* Apache's mod rewrite.
*
* Author - scottayy@gmail.com
*/

 
class safeurl
{
    //decode html entities in string?
    //param boolean $decode
    var $decode = true;
 
    //charset to use if $decode is set to true
    //param string $decode_charset
    var $decode_charset = 'ISO-8859-1';
 
    //turns string into all lowercase letters
    //param boolean $lowercase
    var $lowercase = true;
 
    //strip out html tags from string?
    //param boolean $strip
    var $strip = true;
 
    //maximum length of resulting title
    //param int $maxlength
    var $maxlength = 50;
   
    //if maxlength is reached, chop at nearest whole word? or hard chop?
    //param boolean $whole_word
    var $whole_word = true;
 
    //what title to use if no alphanumeric characters can be found
    //param string $blank
    var $blank = 'no-title';
 
    //the worker function
    //param string $text
    function make_safe_url($text)
    {
        //prepare the string according to our options
        if($this->decode)
        {
            $text = html_entity_decode($text,ENT_QUOTES,$this->decode_charset);
        }
 
        if($this->lowercase)
        {
            $text = strtolower($text);
        }
 
        if($this->strip)
        {
            $text = strip_tags($text);
        }
 
        //filter
        $text = preg_replace("/[^&a-z0-9_-\s']/i",'',$text);
        $text = str_replace(array('&',' ','\''),array(' and ','-',''),$text);
        $text = trim(preg_replace("/-{2,}/",'-',$text), "-");
 
        //chop?
        if(strlen($text) > $this->maxlength)
        {
            $text = substr($text,0,$this->maxlength);
           
            if($this->whole_word)
            {
                $text = explode('-',$text);
                $text = implode('-',array_diff($text,array(array_pop($text))));
            }
        }
 
        //return =]
        if($text == '')
        {
            return $blank;
        }
 
        return $text;
    }
 
}
 
?>
 

 
Test 1
 
Syntax: [ Download ] [ Hide ]
 
$safeurl = new safeurl();
 
$tests = array(
        'i\'m a test string!! do u like me. or not......., billy bob!!@#',
        '<b>some HTML</b> in <i>here</i>!!~',
        'i!@#*#@ l#*(*(#**$*o**(*^v^*(e d//////e\\\\\\\\v,,,,,,,,,,n%$#@!~e*(+=t',
        'A lOng String wiTh a buNchess of words thats! should be -chopped- at the last whole word'
);
 
foreach($tests AS $test)
{
        echo $safeurl->make_safe_url($test).'<br />';
}

 
Output 1
Syntax: [ Download ] [ Hide ]
im-a-test-string-do-u-like-me-or-not-billy-bob
some-html-in-here
i-love-devnet
a-long-string-with-a-bunchess-of-words-thats

 
We'll change a few properities of the object in the test.
 
Test 2
Syntax: [ Download ] [ Hide ]
</span></li><li style=\"\" class=\"li2\"><span style=\"color: #ff0000;\">$safeurl = new safeurl(); </span></li><li style=\"\" class=\"li1\"><span style=\"color: #ff0000;\">$safeurl->lowercase = false;</span></li><li style=\"\" class=\"li2\"><span style=\"color: #ff0000;\">$safeurl->whole_word = false;</span></li><li style=\"\" class=\"li1\">&nbsp;</li><li style=\"\" class=\"li2\"><span style=\"color: #ff0000;\">$tests = array( </span></li><li style=\"\" class=\"li1\"><span style=\"color: #ff0000;\"> &nbsp; &nbsp; &nbsp; &nbsp;'</span>i\span style=\"color: #ff0000;\">'m a test string!! do u like me. or not......., billy bob!!@#'</span>, </li><li style=\"\" class=\"li2\">&nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #ff0000;\">'<b>some HTML</b> in <i>here</i>!!~'</span>, </li><li style=\"\" class=\"li1\">&nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #ff0000;\">'i!@#*#@ l#*(*(#**$*o**(*^v^*(e d//////e<span style=\"color: #000099; font-weight: bold;\">\\</span><span style=\"color: #000099; font-weight: bold;\">\\</span><span style=\"color: #000099; font-weight: bold;\">\\</span><span style=\"color: #000099; font-weight: bold;\">\\</span>v,,,,,,,,,,n%$#@!~e*(+=t'</span>,</li><li style=\"\" class=\"li2\">&nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #ff0000;\">'A lOng String wiTh a buNchess of words thats! should be -chopped- at the last whole word'</span></li><li style=\"\" class=\"li1\"><span style=\"color: #66cc66;\">&#41;</span>; </li><li style=\"\" class=\"li2\">&nbsp;</li><li style=\"\" class=\"li1\"><a href=\"http://www.php.net/foreach\"><span style=\"color: #b1b100;\">foreach</span></a><span style=\"color: #66cc66;\">&#40;</span><span style=\"color: #0000ff;\">$tests</span> <a href=\"http://www.php.net/as\"><span style=\"color: #b1b100;\">AS</span></a> <span style=\"color: #0000ff;\">$test</span><span style=\"color: #66cc66;\">&#41;</span> </li><li style=\"\" class=\"li2\"><span style=\"color: #66cc66;\">&#123;</span> </li><li style=\"\" class=\"li1\">&nbsp; &nbsp; &nbsp; &nbsp; <a href=\"http://www.php.net/echo\"><span style=\"color: #b1b100;\">echo</span></a> <span style=\"color: #0000ff;\">$safeurl</span>-><span style=\"color: #006600;\">make_safe_url</span><span style=\"color: #66cc66;\">&#40;</span><span style=\"color: #0000ff;\">$test</span><span style=\"color: #66cc66;\">&#41;</span>.<span style=\"color: #ff0000;\">'<br />'</span>; </li><li style=\"\" class=\"li2\"><span style=\"color: #66cc66;\">&#125;</span></li><li style=\"\" class=\"li1\"><span style=\"color: #66cc66;\">&#91;</span>/php<span style=\"color: #66cc66;\">&#93;</span></li><li style=\"\" class=\"li2\">&nbsp;</li><li style=\"\" class=\"li1\"><span style=\"color: #66cc66;\">&#91;</span>b<span style=\"color: #66cc66;\">&#93;</span><span style=\"color: #000000; font-weight: bold;\">Output</span> <span style=\"color: #cc66cc;\">2</span><span style=\"color: #66cc66;\">&#91;</span>/b<span style=\"color: #66cc66;\">&#93;</span></li><li style=\"\" class=\"li2\"><span style=\"color: #66cc66;\">&#91;</span>code<span style=\"color: #66cc66;\">&#93;</span>im-a-test-string-do-u-like-me-or-not-billy-bob</li><li style=\"\" class=\"li1\">some-HTML-in-here</li><li style=\"\" class=\"li2\">i-love-devnet</li><li style=\"\" class=\"li1\">A-lOng-String-wiTh-a-buNchess-of-words-thats-shoul</li></ol></div>

 
Real World Project Usage
Syntax: [ Download ] [ Hide ]
 
echo '<a href="blog/12/'.$safeurl->make_safe_url($blog_title).'">'.$blog_title.'</a>';
 
 

_________________
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Feb 07, 2007 12:26 am 
Offline
Tranquility In Moderation
User avatar

Joined: Sun Feb 06, 2005 8:18 pm
Posts: 5001
Location: Indiana
I updated the above code because there were two problems that were bugging me.

If the string "what are you doing today/tomorrow" were passed to it, it'd come out as 'what-are-you-doing-todaytomorrow'. So I made all non-alphanumeric characters (except ') be replaced with a -. So now that string would come out correctly.

Also, if the string "-my day today-" were passed to it, it'd come out as '-my-day-today-' which is ugly with the leading and trailing -'s. So I trimmed those.

_________________
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.


Top
 Profile  
 
PostPosted: Fri Oct 03, 2008 5:26 am 
Offline
Admin
User avatar

Joined: Wed Aug 13, 2003 7:02 am
Posts: 4522
Location: York, UK
"Design/Project Engineer" still comes out as "designproject-engineer" for me 8O


Top
 Profile  
 
PostPosted: Mon Oct 27, 2008 1:33 pm 
Offline
Tranquility In Moderation
User avatar

Joined: Sun Feb 06, 2005 8:18 pm
Posts: 5001
Location: Indiana
That is really weird. Why isn't / being caught by this regexp?

Syntax: [ Download ] [ Hide ]
$text = preg_replace('#[^&a-z0-9\s\']#', '', $text);


I can't figure it out, because if you do "Design / Project Engineer" it works. Or even "Design/ Project Engineer".

_________________
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.


Top
 Profile  
 
PostPosted: Thu Oct 30, 2008 6:18 am 
Offline
Admin
User avatar

Joined: Wed Aug 13, 2003 7:02 am
Posts: 4522
Location: York, UK
As a temporary measure, i have just done this

Syntax: [ Download ] [ Hide ]
 
 //filter
$text = str_replace("/","-",$text);
$text = preg_replace("/[^&a-z0-9_-\s']/i",'',$text);
$text = str_replace(array('&',' ',''),array(' and ','-',''),$text);
$text = trim(preg_replace("/-{2,}/",'-',$text), "-");
 


Top
 Profile  
 
PostPosted: Thu Dec 04, 2008 5:46 am 
Offline
Forum Newbie
User avatar

Joined: Wed Dec 03, 2008 9:00 am
Posts: 15
I absolutely love temporary measures. To be honest, I can't see what's wrong with the RegEx either but I just wanted to say thanks for posting the code for people. I'm a coder but on the usability junkie / web marketing side of things (yes, you can boo now) so it's always refreshing to see code samples like this for non-marketing orientated web developers.

This week I was in a meeting with a client whom I consult for, they've had a great web site built for them by a really strong technical developer who took the time to do some Search Engine Optimization work on the site (at their request) so that the listings appeared as sub pages automatically but he simply refused to understand how or why making the URL's keyword rich would matter.

As a result, what you have twenty thousands pages such as:

Syntax: [ Download ] [ Hide ]
domain.com/?p=1
domain.com/?p=2
...
domain.com/?p=20000


Instead of:

Syntax: [ Download ] [ Hide ]
domain.com/dating/new-york/albany/


This simple piece of code would have been invaluable to the client, and saved the developer a lot of frustrated emails from non technical, marketing orientated clients.

Thanks again.


Top
 Profile  
 
PostPosted: Tue Apr 20, 2010 2:40 pm 
Offline
Forum Newbie

Joined: Sat Apr 17, 2010 11:21 am
Posts: 2
Hi there, using this code in one of my projects. Thank you for sharing! Here are the updates that I have made. I converted it to my local coding standard, sorry that I made diffing a hassle.

Added Features:
* Added a translation table for non-ascii characters.
* Fixed a bug where low values for maxlength obliterated the string.
* $this->seperator (defaults to hyphen) is used to separate words. I need to have underscores in one project I'm on and hyphens in another.

Syntax: [ Download ] [ Hide ]
<?php

/**
 * This short class will turn user entered titles into URLs
 * that are keyword rich and human readable.  For use with
 * Apache's mod rewrite.
 *
 * @author scottayy@gmail.com
 * @author $Author: $
 *
 */

class SafeUrl {
    /**
     * decode html entities in string?
     * @var boolean
     */

    var $decode = true;
    /**
     * charset to use if $decode is set to true
     * @var string
     */

    var $decode_charset = 'UTF-8';
    /**
     * turns string into all lowercase letters
     * @var boolean
     */

    var $lowercase = true;
    /**
     * strip out html tags from string?
     * @var boolean
     */

    var $strip = true;
    /**
     * maximum length of resulting title
     * @var int
     */

    var $maxlength = 50;
    /**
     * if maxlength is reached, chop at nearest whole word? or hard chop?
     * @var boolean
     */

    var $whole_word = true;
    /**
     * what title to use if no alphanumeric characters can be found
     * @var string
     */

    var $blank = 'no-title';
    /**
     * Allow a differnt character to be used as the separator.
     * @var string
     */

    var $separator = '-';
    /**
     * A table of UTF-8 characters and what to make them.
     * @link http://www.php.net/manual/en/function.strtr.php#90925
     * @var array
     */

    var $translation_table = array(
        'Š'=>'S', 'š'=>'s', 'Đ'=>'Dj','Ð'=>'Dj','đ'=>'dj', 'Ž'=>'Z', 'ž'=>'z', 'Č'=>'C', 'č'=>'c', 'Ć'=>'C', 'ć'=>'c',
        'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
        'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
        'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss',
        'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
        'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
        'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b',
        'ÿ'=>'y', 'Ŕ'=>'R', 'ŕ'=>'r',
        /**
         * Special characters:
         */

        "'"    => '',       // Single quote
        '&'    => ' and ',  // Amperstand
        "\r\n" => ' ',      // Newline
        "\n"   => ' '       // Newline

    );

    /**
     * Class constructor
     *
     * @param array $options
     */

    function SafeUrl( $options='' ) {
        if (is_array($options)) {
            foreach($options as $property => $value) {
                $this->$property = $value;
            }
        }
    }

    /**
     * the worker function
     *
     * @param string $text
     * @return string
     */

    function makeUrl($text) {
        //Shortcut
        $s = $this->separator;
        //prepare the string according to our options
        if ($this->decode) {
            $text = html_entity_decode($text, ENT_QUOTES, $this->decode_charset);
            $text = strtr($text, $this->translation_table);
        }

        if ($this->lowercase) {
            $text = strtolower($text);
        }
        if ($this->strip) {
            $text = strip_tags($text);
        }

        //filter
        $text = preg_replace("/[^&a-z0-9_-\s']/i", '', $text);
        $text = str_replace(' ', $s, $text);
        $text = trim(preg_replace("/{$s}{2,}/", $s, $text), $s);

        //chop?
        if (strlen($text) > $this->maxlength) {
            $text = substr($text, 0, $this->maxlength);

            if ($this->whole_word) {
                /**
                 * If maxlength is small and leaves us with only part of one
                 * word ignore the "whole_word" filtering.
                 */

                $words = explode($s, $text);
                $temp  = implode($s, array_diff($words, array(array_pop($words))));
                if ($temp != '') {
                    $text = $temp;
                }
            }
        }
        //return =]
        if ($text == '') {
            return null;
        }

        return $text;
    }
}
 


Test File

This is a PHPUnit Unit Test.

Syntax: [ Download ] [ Hide ]
<?php
require_once 'PHPUnit/Framework.php';

require_once dirname(__FILE__) . '/../../lib/SafeUrl.class.php';

/**
 * Test class for SafeUrl.
 * Generated by PHPUnit on 2010-04-20 at 12:57:43.
 */

class SafeUrlTest extends PHPUnit_Framework_TestCase {

    /**
     * @var SafeUrl
     */

    protected $object;

    /**
     * Sets up the fixture, for example, opens a network connection.
     * This method is called before a test is executed.
     */

    protected function setUp() {
        $this->object = new SafeUrl;
    }

    /**
     * Tears down the fixture, for example, closes a network connection.
     * This method is called after a test is executed.
     */

    protected function tearDown() {

    }

    public function testMakeUrl() {
       
            $this->assertEquals( $this->object->makeUrl(
                'i\'m a test string!! do u like me. or not......., billy bob!!@#'),
                'im-a-test-string-do-u-like-me-or-not-billy-bob');

            $this->assertEquals( $this->object->makeUrl(
                '<b>some HTML</b> in <i>here</i>!!~'),
                'some-html-in-here');

            $this->assertEquals( $this->object->makeUrl(
                'i!@#*#@ l#*(*(#**$*o**(*^v^*(e d//////e\\\\\\\\v,,,,,,,,,,n%$#@!~e*(+=t'),
                'i-love-devnet');

            $this->assertEquals( $this->object->makeUrl(
                'A lOng String wiTh a buNchess of words thats! should be -chopped- at the last whole word'),
                'a-long-string-with-a-bunchess-of-words-thats');

            $this->object->lowercase = false;
            $this->assertEquals( $this->object->makeUrl(
                'Eyjafjallajökull Glacier'),
                'Eyjafjallajokull-Glacier');

            $this->object->maxlength = 100;
            $this->assertEquals( $this->object->makeUrl(
                'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûýýþÿŔŕ'),
                'AAAAAAACEEEEIIIIDjNOOOOOOUUUUYBSsaaaaaaaceeeeiiiionoooooouuuyybyRr');

            $this->object->maxlength = 20;
            $this->assertEquals( $this->object->makeUrl(
                    $this->big_mess),
                    'safeurl-new-safeurl');

            /**
             * Regresstion test:
             *
             * If max length was so small that we where left with only one
             * word, then whole_word would leave us with an empty string.
             */

            $this->object->maxlength = 5;
            $this->object->whole_word = true;
            $this->assertEquals( $this->object->makeUrl(
                'supercalafragalisticexpialadoshus'),
                'super');
           

            /**
             * Acceptable Bug:
             *
             * It would be nice if we put a space between block level elements,
             * but it is kind of too much to ask for.
             */

            $this->object->maxlength = 200;
            $html = <<<HTML
                <div>
                    <h1>Title</h1>
                    <h2>Subtitle!</h2>Read the <a href="ReleaseNotes.html">Release Notes</a> for this Revision.<br/>
                </div>
HTML
;
            $this->assertEquals( $this->object->makeUrl(
                    $html),
                    'Title-SubtitleRead-the-Release-Notes-for-this-Revision');
            /**                    ^
             * Look: --------------|
             *
             * Should be:
             *     'Title-Subtitle-Read-the-Release-Notes-for-this-Revision'
             */

    }
   
    var $big_mess = '
            </span></li><li style=\"\" class=\"li2\"><span style=\"color:
            #ff0000;\">\$safeurl = new safeurl(); </span></li><li style=\"\"
            class=\"li1\"><span style=\"color: #ff0000;\">\$safeurl->lowercase
            = false;</span></li><li style=\"\" class=\"li2\"><span
            style=\"color: #ff0000;\">\$safeurl->whole_word = false;</span></li>
            <li style=\"\" class=\"li1\">&nbsp;</li><li style=\"\"
            class=\"li2\"><span style=\"color: #ff0000;\">\$tests = array(
            </span></li><li style=\"\" class=\"li1\"><span style=\"color:
            #ff0000;\"> &nbsp; &nbsp; &nbsp; &nbsp;\'</span>i\span
            style=\"color: #ff0000;\">\'m a test string!! do u like me. or
            not......., billy bob!!@#\'</span>, </li><li style=\"\"
            class=\"li2\">&nbsp; &nbsp; &nbsp; &nbsp; <span
            style=\"color: #ff0000;\">\'<b>some HTML</b> in <i>here</i>!!~\'
            </span>, </li><li style=\"\" class=\"li1\">&nbsp; &nbsp; &nbsp;
            &nbsp; <span style=\"color: #ff0000;\">\'i!@#*#@ l#*(*(#**$*o**(*^v
            ^*(e d//////e<span style=\"color: #000099; font-weight: bold;\">\\
            </span><span style=\"color: #000099; font-weight: bold;\">\\</span>
            <span style=\"color: #000099; font-weight: bold;\">\\</span><span
            style=\"color: #000099; font-weight: bold;\">\\</span>v,,,,,,,,,,n%
            $#@!~e*(+=t\'</span>,</li>'
;

}

 


Top
 Profile  
 
PostPosted: Fri Feb 18, 2011 3:23 am 
Offline
DevNet Resident
User avatar

Joined: Sun Sep 03, 2006 5:19 am
Posts: 1579
Location: Sofia, Bulgaria
1. The allowed chars should be configurable, and timemachine3030 's idea of using a translation table has merit, especially for non-english texts.
2. The & in the permitted-by-default characters might cause problems. I suggest two helper methods that transform this into an URL (for use in header() for example) using rawurlencode and into a HTML link, using htmlspecialchars


Top
 Profile  
 
PostPosted: Fri Feb 18, 2011 11:00 am 
Offline
Forum Newbie

Joined: Sat Apr 17, 2010 11:21 am
Posts: 2
Mordred wrote:
1. The allowed chars should be configurable, and timemachine3030 's idea of using a translation table has merit, especially for non-english texts.
2. The & in the permitted-by-default characters might cause problems. I suggest two helper methods that transform this into an URL (for use in header() for example) using rawurlencode and into a HTML link, using htmlspecialchars


Great ideas. I had posted the code on github some time ago but forgot to update this post. Patches welcome: https://github.com/timemachine3030/safe-url


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC - 5 hours


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group