Page 1 of 1

Making human readable, keyword rich strings for PHP 5

Posted: Mon Aug 06, 2007 3:16 am
by s.dot
Okay, so this is the new and improved php5 version of this class i made for php4.

The main purpose I use this class for is for making nice human readable keyword rich URLs (that the search engines love). However, there are other uses of this function, so i've dropped the class name safeurl and changed it to safe_string.

I've also added a property _delimiter, so you don't have to use hyphens (-) for the delimiter, you could use underscores, periods, or any other character.

This class will turn user generated strings or strings pulled from databases into strings stripped of everything but alphanumeric characters, separated by a delimiter.

Class safe_string Code

Code: Select all

<?php

/*
** This PHP5 class will turn strings that may be user generated,
** or pulled from a database, containing HTML or other special 
** characters into strings that are human readable, keyword rich, and safe
** for passing as URLs.  Very useful in addition with Apache's mod_rewrite module.
**
** --------------------------------------------------------------------------------
** This is updated from the PHP4 safeurl() class.
** Changes: All object properties are set to private.
** Delimeter can be chosen, instead of being forced to be a hyphen (-).
** The main fuction name make_safe_url() has been changed to just make_safe(), because 
** there are many other valid uses for this class other than just using the strings as
** URLs.
** Class has been broken up into many methods each performing a specific task, rather
** than being thrown into one method.
** --------------------------------------------------------------------------------
**
** Author - << smp_info _at_ yahoo _dot_ com >>
** Date - Monday, August 6th, 2007
*/

class safe_string
{
	/*
	** Set this to false if your string has already been cleaned of entities
	** @access private
	** @bool $_decode
	*/
	private $_decode = true;
	
	/*
	** If $_decode is set to true, this will be the character encoding set that 
	** will be used to decode strings.  Defaults to PHP's default of ISO-8859-1
	** @access private
	** @str $_decode_charset
	*/
	private $_decode_charset = 'ISO-8859-1';
	
	/*
	** Decides whether or not to leave the string how it is, or to lowercase all letters.
	** Defaults to true, lowercasing of all letters.
	** @access private
	** @bool $_lowercase
	*/
	private $_lowercase = true;
	
	/*
	** If your string has HTML in it, this will strip it out.  true = strip html,
	** false = don't strip html
	** @access private
	** @bool $_strip
	*/
	private $_strip = true;
	
	/*
	** Sets the maximum length of characters in the returned string.
	** @access private
	** @bool $_maxlength
	*/
	private $_maxlength = 50;
	
	/*
	** Decides whether or not to chop the result string at the last whole word separated by 
	** $this->_delimiter.
	** @access private
	** @bool $_whole_word
	*/
	private $_whole_word = true;
	
	/*
	** Used as a delimiter between words.  Can be any character.
	** @access private
	** @str $_delimiter
	*/
	private $_delimiter = '-';
	
	/*
	** Default string to use if no alphanumeric characters can be found in the string
	** @access private
	** @str $_blank
	*/
	private $_blank = 'no-title';
	
	/*
	** Container for our output string
	** @access private
	** @str $_output
	*/
	private $_output;
	
	/*
	** Method to decode the given string of entities
	** @access private
	*/
	private function _decode_string()
	{
		if($this->_decode)
		{
			$this->_output = html_entity_decode($this->_output, ENT_QUOTES, $this->_decode_charset);
		}
	}
	
	/*
	** Method to lowercase the string
	** @access private
	*/
	private function _lowercase_string()
	{
		if($this->_lowercase)
		{
			$this->_output = strtolower($this->_output);
		}
	}
	
	/*
	** Method to strip the string of html tags
	** @access private
	*/
	private function _strip_string()
	{
		if($this->_strip)
		{
			$this->_output = strip_tags($this->_output);
		}
	}
	
	/*
	** Method to filter the string of invalid characters, replace &, spaces, and apostrophes
	** and to replace multiple occurences of $this->_delimiter.
	** @access private
	*/
	private function _filter_string()
	{
		//filter out invalid characters
		$this->_output = preg_replace("/[^&a-z0-9_-\s']/i", '', $this->_output);
		
		//replace &, spaces, and apostrophes with $this->_delimiter
		$this->_output = str_replace(array('&', ' ', '\''), array(' and ', $this->_delimiter, ''), $this->_output);
		
		//trim the string of $this->_delimiter, and replace multiple occurences of $this->_delimiter
		$this->_output = trim(preg_replace("/" . preg_quote($this->_delimiter) . "{2,}/", $this->_delimiter, $this->_output), $this->_delimiter);
	}
	
	/*
	** Method to chop the string to $this->_maxlength characters
	** @access private
	*/
	private function _chop_string()
	{
		if(strlen($this->_output) > $this->_maxlength)
		{
			$this->_output = substr($this->_output, 0, $this->_maxlength);
			$this->_whole_word_string();
		}
	}
	
	/*
	** Method to chop the string at the last whole word separated by $this->_delimiter
	** @access private
	*/
	private function _whole_word_string()
	{
		if($this->_whole_word)
		{
			$this->_output = explode($this->_delimiter, $this->_output);
			$this->_output = implode($this->_delimiter, array_diff($this->_output, array(array_pop($this->_output))));
		}
	}
	
	/*
	** Method that simply runs through the list of methods to prepare $this->_output
	** @access private
	** @param str $string
	** @return str $this->_output
	*/
	private function _run($string)
	{
		$this->_output = $string;
		$this->_decode_string();
		$this->_lowercase_string();
		$this->_strip_string();
		$this->_filter_string();
		$this->_chop_string();
		return $this->_output;
	}
	
	/*
	** Method to call the _run() method, and return $this->_output string
	** @access public
	** @param str $string
	** @return string $this->_output
	*/
	public function make_safe($string)
	{
		return $this->_run($string);
	}
	
	/*
	** Method to allow changing of private properties
	** @access public
	** @param str $property
	** @param mixed $value
	*/
	public function __set($property, $value)
	{
		$this->$property = $value;
	}
}
And, I went ahead and did some tests from the other topic, to ensure that this class produced the exact same results.

Test One

Code: Select all

$safe_string = new safe_string(); 

$tests = array( 
        'i\'m a test string!! do u like me. or not......., billy bob!!@#', 
        '<b>some HTML</b> in <i>here</i>!!~', 
        'i!@#*#@ l#*(*(#**$*o**(*^v^*(e d//////e\\\\\\\\v,,,,,,,,,,n%$#@!~e*(+=t', 
        'A lOng String wiTh a buNchess of words thats! should be -chopped- at the last whole word' 
); 

foreach($tests AS $test) 
{
	echo $safe_string->make_safe($test) . '<br />';
}
Test One Result

Code: Select all

im-a-test-string-do-u-like-me-or-not-billy-bob
some-html-in-here
i-love-devnet
a-long-string-with-a-bunchess-of-words-thats
Test Two

Code: Select all

$safe_string = new safe_string();

//we'll change a few object properties
$safe_string->_lowercase = false;
$safe_string->_whole_word = false;

$tests = array( 
        'i\'m a test string!! do u like me. or not......., billy bob!!@#', 
        '<b>some HTML</b> in <i>here</i>!!~', 
        'i!@#*#@ l#*(*(#**$*o**(*^v^*(e d//////e\\\\\\\\v,,,,,,,,,,n%$#@!~e*(+=t', 
        'A lOng String wiTh a buNchess of words thats! should be -chopped- at the last whole word' 
); 

foreach($tests AS $test) 
{
	echo $safe_string->make_safe($test) . '<br />';
}
Test Two Result

Code: Select all

im-a-test-string-do-u-like-me-or-not-billy-bob
some-HTML-in-here
i-love-devnet
A-lOng-String-wiTh-a-buNchess-of-words-thats-shoul
Real world project usage

Code: Select all

$safe_string = new safe_string();

echo '<a href="blog/jimbob/12/' . $safe_string->make_safe($dba['blog_title']) . '.html">' . $dba['blog_title'] . '</a>';
My example of real world project usage would need a mod rewrite rule, which, together, I find helps me in search engine ranking positions.

Posted: Mon Aug 06, 2007 8:32 am
by superdezign

Code: Select all

urlencode(strtolower(preg_replace('&-+&', '-', preg_replace('&[^A-Za-z0-9]&', '-', strip_tags($data)))));
I miss anything? ;)
Hehe, probably.

Posted: Mon Aug 06, 2007 2:57 pm
by feyd
superdezign wrote:

Code: Select all

urlencode(strtolower(preg_replace('&-+&', '-', preg_replace('&[^A-Za-z0-9]&', '-', strip_tags($data)))));
I miss anything? ;)
Hehe, probably.
Could be done in a single expression instead of two. ;)

Posted: Mon Aug 06, 2007 4:23 pm
by s.dot
Thanks for the encouragement. :x

Posted: Mon Aug 06, 2007 8:25 pm
by s.dot
superdezign wrote:

Code: Select all

urlencode(strtolower(preg_replace('&-+&', '-', preg_replace('&[^A-Za-z0-9]&', '-', strip_tags($data)))));
I miss anything? ;)
Hehe, probably.
Actually, you missed quite a lot. This code you posted does not allow for ANY options, no maximum length, no chopping at whole words, does not take into account entities (which if aren't decoded, will turn & into -amp- (weird)), and most importantly, the use of urlencode() would not make it human readable friendly.

Posted: Mon Aug 06, 2007 8:58 pm
by superdezign
scottayy wrote:
superdezign wrote:

Code: Select all

urlencode(strtolower(preg_replace('&-+&', '-', preg_replace('&[^A-Za-z0-9]&', '-', strip_tags($data)))));
I miss anything? ;)
Hehe, probably.
Actually, you missed quite a lot. This code you posted does not allow for ANY options, no maximum length, no chopping at whole words, does not take into account entities (which if aren't decoded, will turn & into -amp- (weird)), and most importantly, the use of urlencode() would not make it human readable friendly.
Hehe, oh well then.

I usually create a whole class of static functions for doing that sort of thing. It formats my URLs, parses my custom tags, cleans user submitted data, filters HTML, cleans suspicious 'src' and 'href' attributes, etc.

Edit: It's very site-specific, though.

Posted: Mon Aug 06, 2007 9:00 pm
by superdezign
feyd wrote:
superdezign wrote:

Code: Select all

urlencode(strtolower(preg_replace('&-+&', '-', preg_replace('&[^A-Za-z0-9]&', '-', strip_tags($data)))));
I miss anything? ;)
Hehe, probably.
Could be done in a single expression instead of two. ;)
:!:

Code: Select all

&[^A-Za-z0-9]+&