Useful String Functions

Small, short code snippets that other people may find useful. Do you have a good regex that you would like to share? Share it! Even better, the code can be commented on, and improved.

Moderator: General Moderators

Post Reply
User avatar
protokol
Forum Contributor
Posts: 353
Joined: Fri Jun 21, 2002 7:00 pm
Location: Cleveland, OH
Contact:

Useful String Functions

Post by protokol »

I have written two string unctions which I have found to be quite useful. The first function, str_insert() allows the user to insert a string into an old string at a defined string index. The second function, str_multi_insert() does the same thing, but lets you insert multiple strings at the same time.

Code: Select all

<?php
/**
 * Inserts a string before the index set my $strpos of the $old_string that we pass in.
 * 
 * @author  protokol
 * @version 1.0
 * @date    Thu Dec 02 15:00:04 EST 2004
 * @param   string $insert_string The string to insert into the $old_string
 * @param   int $strpos The index to insert the string before in the $old_string
 * @param   string $old_string The string that we want to manipulate
 * @return  string The newly created string
 */
function str_insert ($insert_string, $strpos, $old_string)
{
    $strlen = strlen($old_string);
    
    // If the string position is off the end of the string index, set it to the length of the string
    // which would be the last index in the string plus 1. If the position is less than zero, then
    // treat that number as the index starting from the end of the string. For example, if $strpos is -1,
    // then the strpos we want to reference is the last index of the string. This will place the $insert_string
    // before this index, basically placing it right before the last character in the string.
    if ($strpos > $strlen) {
        $strpos = $strlen;
    } else if ($strpos < 0) {
        $strpos += $strlen;
    }
    
    // Create the first part of the string up to the point we want to insert the new string at
    $new_string  = substr($old_string, 0, $strpos);
    
    // Tack on the string we are inserting
    $new_string .= $insert_string;
    
    // Finish it off by getting the rest of the original string starting at $strpos
    $new_string .= substr($old_string, $strpos);
    
    return $new_string;
}

/**
 * Inserts strings into the $old_string at specified locations. Works the same way that str_insert()<br>
 * does except it allows you to insert multiple strings into multiple locations at the same time.
 * 
 * @author  protokol
 * @version 1.0
 * @date    Thu Dec 02 15:00:16 EST 2004
 * @param   array $new_strings The strings to insert into the $old_string, along with the positions to insert them
 * @param   string $old_string The string that we want to manipulate
 * @return  string The newly created string
 */
function str_multi_insert ($new_strings, $old_string)
{
    // Since this function needs an array of new strings, just return the old string if we don't have it
    if (!is_array($new_strings)) {
        return $old_string;
    }
    
    $start = 0;
    $new_string = '';
    
    // Sort the input array so that we get correct substring indices
    ksort($new_strings);
    
    // Run through all of the insert values and create a new string with them
    foreach ($new_strings as $strpos => $insert_string) {
        // Get the section of the string from the old string
        $new_string .= substr($old_string, $start, ($strpos - $start));
        
        // Tack on the new string we want to insert
        $new_string .= $insert_string;
        
        // Set the new start position to the current string position
        $start = $strpos;
    }
    
    // Get the rest of the original string
    $new_string .= substr($old_string, $start);
    
    return $new_string;
}
?>
Below is an example of how you can use the functions:

Code: Select all

<?php
$string = 'I am a PHP developer.';
$new_string = str_insert(' kickass', 6, $string);

// This gives $new_string the value of:
// I am a kickass PHP developer.

$string = "8007233288";
$insert_strings = array(
    0 => '(',
    3 => ') ',
    6 => '-'
);
$new_string = str_multi_insert($insert_strings, $string);

// This gives $new_string the value of:
// (800) 723-3288
?>
That's all, so I hope you find these useful. Enjoy!
Last edited by protokol on Sat Jan 08, 2005 10:22 am, edited 3 times in total.
McGruff
DevNet Master
Posts: 2893
Joined: Thu Jan 30, 2003 8:26 pm
Location: Glasgow, Scotland

Post by McGruff »

I hope you don't mind but I edited the topic title slightly from "Useful String Insertion Functions" to "Useful String Functions". I was hoping we'd get a range of posts with general string tools.

Here's another.

Code: Select all

/*
    CLASS StringCutter

    Cut chunks out of a string using regex patterns.
    Can make repeated cuts on the same target string. 
    The target string will remain unaffected.    
    Each cut removes a portion of the internal copy of the target string; the
    remainder can be obtained with the getRemainder() method. This could 
    be used for example to check if the cuts have captured all data contained
    in the string.

*/
class StringCutter
{
    var $_num_cuts = 0;
    var $_num_cuts_error = 'The string cutter pattern has multiple matches.';
    
    var $cut;
    var $string;

    function StringCutter($string) 
    {
        $this->string = $string;
    }

    function cut($regex)
    {
        $this->_num_cuts = 0;
        $this->cut = null;
        if($string = preg_replace_callback($regex,
                                           array(&$this, '_callback'), 
                                           $this->string))
        {
            $this->string = $string;
        }
        if($this->_num_cuts > 1)
        {
            trigger_error($this->_num_cuts_error);
            return false;
        }
        return $this->cut;
    }
    
    function getRemainder() 
    {
        return $this->string;
    }
    
    //////////////////////////////////////////
    //              PRIVATE                 //
    //////////////////////////////////////////
    
    /*
        Only looking at $ 0.
    */
    function _callback($match)
    {
        $this->cut = $match[0];
        $this->_num_cuts++;
        return '';
    }
}
The test (using SimpleTest):

Code: Select all

class TestOfStringCutter extends UnitTestCase 
{
    function TestOfStringCutter() 
    {
        $this->UnitTestCase();
    }
    function testSimpleCut()
    {        
        $string = 'The quick brown fox.';
        $scissors =& new StringCutter($string);
        $this->assertIdentical($scissors->cut('#The quick#'), 'The quick');
        $this->assertIdentical($scissors->getRemainder(), ' brown fox.');
    }
    function testCutWithNoMatch() 
    {
        $string = 'The quick brown fox.';
        $scissors =& new StringCutter($string);
        $this->assertIdentical($scissors->cut('#foobar#'), null);
        $this->assertIdentical($scissors->getRemainder(), 'The quick brown fox.');
    }
    function testMultipleCuts() 
    {        
        $string = 'The quick brown fox.';
        $scissors =& new StringCutter($string);
        $this->assertIdentical($scissors->cut('#The quick#'), 'The quick');
        $this->assertIdentical($scissors->getRemainder(), ' brown fox.');
        $this->assertIdentical($scissors->cut('#foobar#'), null);
        $this->assertIdentical($scissors->getRemainder(), ' brown fox.');
        $this->assertIdentical($scissors->cut('#brown#'), 'brown');
        $this->assertIdentical($scissors->getRemainder(), '  fox.');
        $this->assertIdentical($scissors->cut('#foobar#'), null);
        $this->assertIdentical($scissors->getRemainder(), '  fox.');
    }
    function testOriginalStringIsUnmolested()
    {        
        $string = 'The quick brown fox.';
        $scissors =& new StringCutter($string);
        $scissors->cut('#The quick#');
        $this->assertIdentical($string, 'The quick brown fox.');
    }
    /*
        The class was written as part of a system to clean up user input where
        multiple bits of data had been stored in single fields. Multiple matches 
        would mean the regex pattern has failed to identify a discrete data
        chunk; hence, an error is triggered if more than one match is found.
    */
    function testMultipleMatchesError()
    {        
        $string = 'The quick brown fox. The quick brown fox.';
        $scissors =& new StringCutter($string);
        $this->assertIdentical($scissors->cut('#The quick#'), false);
        $this->assertError($scissors->_num_cuts_error);
    }
    function testWithRegexError()
    {        
        $string = 'The quick brown fox.';
        $scissors =& new StringCutter($string);
        $this->assertIdentical($scissors->cut('quick'), null);
        $this->assertIdentical($scissors->getRemainder(), 'The quick brown fox.');
        $this->assertError('Delimiter must not be alphanumeric or backslash');
    }
Last edited by McGruff on Sun Aug 07, 2005 9:19 am, edited 1 time in total.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

guess the language of utf-8 encoded string

Post by Weirdan »

Guess the language of utf-8 encoded string
This snippet is originally developed to detect the text direction of utf-8 encoded string. parse_utf parses an utf-8 string into an array of character ordinals. utf_counts_by_ranges returns an array where language identifier is a key and value is count of characters from the original string falling into the range of the language (as defined in http://www.unicode.org/Public/UNIDATA/Blocks.txt). get_ranges is auxiliary function to load and parse character ranges from external file.

Code: Select all

function get_ranges($filename = 'uc-ranges.txt') {
    static $ranges = array();
    if($ranges) return $ranges;
    foreach(file($filename) as $line) {
        list($start, $end, $name) = preg_split('/(\.\.|; )/', trim($line));
        $ranges&#1111;$name] = array('start' =&gt; hexdec($start), 'end' =&gt; hexdec($end));
    }
    return $ranges;
}

function parse_utf8($string) {
    static $masks = array(
            0 =&gt; 127,
            2 =&gt; 31,
            3 =&gt; 15,
            4 =&gt; 7,
    );
    $ret = array();
    for($i = 0, $len = strlen($string); $i &lt; $len; ) {
        $oct_len = strpos( sprintf("%'08b", ord($string{$i}) ), '0' );
        $char = 0;
        for($q = $oct_len - ($oct_len &gt; 0); $q &gt;= 0; $q--) {
            $char |= (
                       (
                         ord( $string{$i + $q} )
                         &amp;
                         ( $q &gt; 0 ? 63 : $masks&#1111;$oct_len] )
                       )
                       &lt;&lt;
                       ( ($oct_len - ($oct_len &gt; 0) - $q) * 6 )
                     );
        }
        $ret&#1111;] = $char;
        $i += ( $oct_len + ($oct_len == 0) );
    }
    if($ret&#1111;0] == 0xfeff) array_shift($ret); // get rid of signature octets
    return $ret;
}

function utf_counts_by_ranges($string, $ranges) {
    $ret = array();
    foreach(parse_utf8($string) as $char) {
        foreach($ranges as $name =&gt; $boundaries)
            if( ($char &gt;= $boundaries&#1111;'start']) &amp;&amp; ($char &lt;= $boundaries&#1111;'end']) )
                @$ret&#1111;$name]++;
    }
    return $ret;
}
var_dump(utf_counts_by_ranges(file_get_contents('asd'), get_ranges()));
uc-ranges.txt is a copy of http://www.unicode.org/Public/UNIDATA/Blocks.txt with all comments and blank lines stripped
rehfeld
Forum Regular
Posts: 741
Joined: Mon Oct 18, 2004 8:14 pm

Post by rehfeld »

php5 has some nice string functions not available in php4.

you can emulate the functionality w/ this package

http://pear.php.net/package/PHP_Compat

keep in mind, each function is kept in its own file,
so if you only need 1 or 2 functions,
you can just grab those files and include them manually
instead of installing the entire package


also, i dont think too many people know of these functions, thought id point them out

http://php.net/ctype
tores
Forum Contributor
Posts: 120
Joined: Fri Jun 18, 2004 3:04 am

Post by tores »

Code: Select all

function in_string($haystack, $needle, $insensitive = 0) {
  $func = $insensitive ? "stristr" : "strpos";
  return $func($haystack, $needle)!==false ? true : false;
}
User avatar
protokol
Forum Contributor
Posts: 353
Joined: Fri Jun 21, 2002 7:00 pm
Location: Cleveland, OH
Contact:

Post by protokol »

Code: Select all

/*
Split a string into two parts by giving it the index to perform the split at.
*/
function str_split_index($string, $index)
{
    $string = (string) $string;
    $index = (int) $index;

    $prefix = substr($string, 0, $index);
    $suffix = substr($string, $index);

    return array($prefix, $suffix);
}
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

I'm curious why you are type-casting in PHP? PHP handles this anyway but I guess logically it may speed things up.
User avatar
protokol
Forum Contributor
Posts: 353
Joined: Fri Jun 21, 2002 7:00 pm
Location: Cleveland, OH
Contact:

Post by protokol »

Because it guarantees that the parameters are of the type that I expect them to be.
Post Reply