Page 1 of 1

highlight substrings and full pattern regex matches

Posted: Thu Dec 20, 2007 1:13 am
by vapoorize
Here is a function I made to highlight not only full pattern matches (yellow) but also all the substrings (multiple colors) from regex matches with preg_match_all(). The challenging part was figuring out a way to not only slip in the span tags, but html entity everything else. The code should explain itself, I used an array...

The problem is its not very quick on high # of full pattern matches and substring matches. Do you have an ideas/suggestions to quicken it, perhaps another method that can incorporate the highlighting + the html entitizing of the data?
You can see the function in action here:
http://tinyurl.com/2ynp8z

Code: Select all

<?php
function highlightPregHay($pat,$hay,$colors,$fullColor,$out = false) {
    //out must be with preg_offset_capture flag, no special ordering
    $hayLen = strlen($hay);
    $matches = true;
    $h = '';
    if (!$out) {
        $matches = preg_match_all($pat, $hay, $out, PREG_OFFSET_CAPTURE|PREG_SET_ORDER);
    }
    if ($matches) {
        $lastOff = 0;
        foreach ($out as $matchArr) { //each set of matches
            $origFull = $matchArr[0][0]; //full pat match
            $full = htmlentities($origFull);
            $fullOff = $matchArr[0][1];
            $fullLen = strlen($origFull);
            unset($matchArr[0]);
            if (!empty($matchArr)) {
                $fullChars = str_split($origFull);
                foreach ($fullChars as $k => $char) {
                    $fullChars[$k] = htmlentities($char);
                }
                $colorKey = -1;
                foreach ($matchArr as $subInfoArr) { //each sub
                    $colorKey++;
                    $span = '<span style="background-color: '.$colors[$colorKey].';">';
                    $sub = $subInfoArr[0];
                    $subLen = strlen($sub);
                    $subOff = $subInfoArr[1] - $fullOff;
                    $endOff = ($subOff + $subLen) - 1;
                    $fullChars[$subOff] = $span.$fullChars[$subOff];
                    $fullChars[$endOff] .= '</span>';    
                }
                $full = implode('',$fullChars);
            }
            //highlight the full in the haystack like before here
            if ($fullOff == 0) {
                $left = '';
            } else {
                $left = substr($hay, $lastOff, $fullOff-$lastOff);
            }
            $lastOff = $fullOff + $fullLen;
            $h .= htmlentities($left).'<span style="background-color: '.$fullColor.';">'.$full.'</span>';
        }
        if ($lastOff < $hayLen) {
            $h .= htmlentities(substr($hay, $lastOff));
        }
        return $h;
    } else {
        return false;
    }
}

//substring colors
$colors[] = '#0066FF';
$colors[] = '#00FF66';
$colors[] = '#CC66FF';
$colors[] = '#FF0000';
$colors[] = '#FF9900';
$colors[] = '#99FFFF';
$colors[] = '#999999';
$colors[] = '#FF9966';
$colors[] = '#336699';
$colors[] = '#CC6666';
$colors[] = '#FF00CC';
$colors[] = '#CCFF99';
$colors[] = '#996633';
$colors[] = '#0099CC';
$colors[] = '#33CC33';

//full match color
$fullColor = '#FFFF00';

//regex
$pat = '~(hello) (how (are you (doi(ng t(oda)y, la) la) la this) is co)ol\!~';
$hay = 'hello how are you doing today, la la la this is cool! some more text';
echo highlightPregHay($pat,$hay,$colors,$fullColor);
?>

Posted: Thu Dec 20, 2007 2:15 am
by s.dot
That's pretty nifty looking ;) But a bit hefty!

If it were me, I'd turn it into a class with colors as class members, and other options as members.

Also, you're using a lot of variables which don't really need to be used. Just for example, in this piece of code:

Code: Select all

$colorKey++;
                    $span = '<span style="background-color: '.$colors[$colorKey].';">';
                    $sub = $subInfoArr[0];
                    $subLen = strlen($sub);
This could be shortened to:

Code: Select all

$span = '<span style="background-color: ' . $colors[$colorKey++] . ';">';
$subLen = strlen($subInfoArr[0]);
This could be done in lots of places throughout the function to make it tighter and more compact. It would also use less memory for variable holding. This might improve speed slightly, but I'm sure the weight of the speed is in the function calls.

Also, just curious, why the choice to highlight the background of the text instead of changing the text color?

Posted: Thu Dec 20, 2007 11:14 am
by vapoorize
If it were me, I'd turn it into a class with colors as class members, and other options as members.
I'm sure the weight of the speed is in the function calls.
Wouldn't a class with the calling of the object, storing the object, be way slower than calling a single function?
I can understand from a viewpoint if I will be re-using variables that I could store as variables in the class which I don't have to re-assign again and again, but isn't that the point of global declaration with regular functions? Give me any single function and compare the speed of it within a class, I believe the calling of the single func versus the one in the class will always be faster.


Good idea on using less variables.. there are a bunch that can be cleaned up...
Also, just curious, why the choice to highlight the background of the text instead of changing the text color?
Well I originally had it changing the text color but if you were highlighting just a single character it wasn't so easy to spot the highlighting on it. Also, how could you highlight characters like tabs or spaces? Besides isn't highlighting the background of text a norm in highlighting? Look at google.

Thanks for the feedback, but the main thing I was concerned about was how I put every single character of the full pattern match into an array, I do this so I can change/add things at certain offsets and not loose count of the original offsets & order of the characters in the full pattern. The tricky part is that some substrings exist within other substrings. ie: (so(meth)ing) Is there a better method to do this, other than using this array? Think about it..

Posted: Thu Dec 20, 2007 11:01 pm
by s.dot
Having it in a class would make it more configurable, readable, and extensible. That would make it far more worthwhile than a function, even if it were a couple hundredths of a second slower.

I'm really not sure how the inner workings of it go just by looking at it, but It would be easier it seems if you could get each smallest substring into an array. ie (so(met)hing) into (so) (met) (hing), if it doesn't already do that. Then you could just array_shift() and array_unshift() span tags on each substring and implode() the array.

Along the lines of the coloring, that's your choice, really as it is your function :-D I was just going along with the highlight_string() and highlight_file() functions which color the characters instead of the background.