highlight substrings and full pattern regex matches

Coding Critique is the place to post source code for peer review by other members of DevNetwork. Any kind of code can be posted. Code posted does not have to be limited to PHP. All members are invited to contribute constructive criticism with the goal of improving the code. Posted code should include some background information about it and what areas you specifically would like help with.

Popular code excerpts may be moved to "Code Snippets" by the moderators.

Moderator: General Moderators

Post Reply
vapoorize
Forum Newbie
Posts: 22
Joined: Mon Dec 17, 2007 5:35 pm

highlight substrings and full pattern regex matches

Post by vapoorize »

Here is a function I made to highlight not only full pattern matches (yellow) but also all the substrings (multiple colors) from regex matches with preg_match_all(). The challenging part was figuring out a way to not only slip in the span tags, but html entity everything else. The code should explain itself, I used an array...

The problem is its not very quick on high # of full pattern matches and substring matches. Do you have an ideas/suggestions to quicken it, perhaps another method that can incorporate the highlighting + the html entitizing of the data?
You can see the function in action here:
http://tinyurl.com/2ynp8z

Code: Select all

<?php
function highlightPregHay($pat,$hay,$colors,$fullColor,$out = false) {
    //out must be with preg_offset_capture flag, no special ordering
    $hayLen = strlen($hay);
    $matches = true;
    $h = '';
    if (!$out) {
        $matches = preg_match_all($pat, $hay, $out, PREG_OFFSET_CAPTURE|PREG_SET_ORDER);
    }
    if ($matches) {
        $lastOff = 0;
        foreach ($out as $matchArr) { //each set of matches
            $origFull = $matchArr[0][0]; //full pat match
            $full = htmlentities($origFull);
            $fullOff = $matchArr[0][1];
            $fullLen = strlen($origFull);
            unset($matchArr[0]);
            if (!empty($matchArr)) {
                $fullChars = str_split($origFull);
                foreach ($fullChars as $k => $char) {
                    $fullChars[$k] = htmlentities($char);
                }
                $colorKey = -1;
                foreach ($matchArr as $subInfoArr) { //each sub
                    $colorKey++;
                    $span = '<span style="background-color: '.$colors[$colorKey].';">';
                    $sub = $subInfoArr[0];
                    $subLen = strlen($sub);
                    $subOff = $subInfoArr[1] - $fullOff;
                    $endOff = ($subOff + $subLen) - 1;
                    $fullChars[$subOff] = $span.$fullChars[$subOff];
                    $fullChars[$endOff] .= '</span>';    
                }
                $full = implode('',$fullChars);
            }
            //highlight the full in the haystack like before here
            if ($fullOff == 0) {
                $left = '';
            } else {
                $left = substr($hay, $lastOff, $fullOff-$lastOff);
            }
            $lastOff = $fullOff + $fullLen;
            $h .= htmlentities($left).'<span style="background-color: '.$fullColor.';">'.$full.'</span>';
        }
        if ($lastOff < $hayLen) {
            $h .= htmlentities(substr($hay, $lastOff));
        }
        return $h;
    } else {
        return false;
    }
}

//substring colors
$colors[] = '#0066FF';
$colors[] = '#00FF66';
$colors[] = '#CC66FF';
$colors[] = '#FF0000';
$colors[] = '#FF9900';
$colors[] = '#99FFFF';
$colors[] = '#999999';
$colors[] = '#FF9966';
$colors[] = '#336699';
$colors[] = '#CC6666';
$colors[] = '#FF00CC';
$colors[] = '#CCFF99';
$colors[] = '#996633';
$colors[] = '#0099CC';
$colors[] = '#33CC33';

//full match color
$fullColor = '#FFFF00';

//regex
$pat = '~(hello) (how (are you (doi(ng t(oda)y, la) la) la this) is co)ol\!~';
$hay = 'hello how are you doing today, la la la this is cool! some more text';
echo highlightPregHay($pat,$hay,$colors,$fullColor);
?>
Last edited by vapoorize on Thu Dec 20, 2007 1:25 pm, edited 1 time in total.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

That's pretty nifty looking ;) But a bit hefty!

If it were me, I'd turn it into a class with colors as class members, and other options as members.

Also, you're using a lot of variables which don't really need to be used. Just for example, in this piece of code:

Code: Select all

$colorKey++;
                    $span = '<span style="background-color: '.$colors[$colorKey].';">';
                    $sub = $subInfoArr[0];
                    $subLen = strlen($sub);
This could be shortened to:

Code: Select all

$span = '<span style="background-color: ' . $colors[$colorKey++] . ';">';
$subLen = strlen($subInfoArr[0]);
This could be done in lots of places throughout the function to make it tighter and more compact. It would also use less memory for variable holding. This might improve speed slightly, but I'm sure the weight of the speed is in the function calls.

Also, just curious, why the choice to highlight the background of the text instead of changing the text color?
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
vapoorize
Forum Newbie
Posts: 22
Joined: Mon Dec 17, 2007 5:35 pm

Post by vapoorize »

If it were me, I'd turn it into a class with colors as class members, and other options as members.
I'm sure the weight of the speed is in the function calls.
Wouldn't a class with the calling of the object, storing the object, be way slower than calling a single function?
I can understand from a viewpoint if I will be re-using variables that I could store as variables in the class which I don't have to re-assign again and again, but isn't that the point of global declaration with regular functions? Give me any single function and compare the speed of it within a class, I believe the calling of the single func versus the one in the class will always be faster.


Good idea on using less variables.. there are a bunch that can be cleaned up...
Also, just curious, why the choice to highlight the background of the text instead of changing the text color?
Well I originally had it changing the text color but if you were highlighting just a single character it wasn't so easy to spot the highlighting on it. Also, how could you highlight characters like tabs or spaces? Besides isn't highlighting the background of text a norm in highlighting? Look at google.

Thanks for the feedback, but the main thing I was concerned about was how I put every single character of the full pattern match into an array, I do this so I can change/add things at certain offsets and not loose count of the original offsets & order of the characters in the full pattern. The tricky part is that some substrings exist within other substrings. ie: (so(meth)ing) Is there a better method to do this, other than using this array? Think about it..
Last edited by vapoorize on Mon Dec 24, 2007 12:25 am, edited 1 time in total.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

Having it in a class would make it more configurable, readable, and extensible. That would make it far more worthwhile than a function, even if it were a couple hundredths of a second slower.

I'm really not sure how the inner workings of it go just by looking at it, but It would be easier it seems if you could get each smallest substring into an array. ie (so(met)hing) into (so) (met) (hing), if it doesn't already do that. Then you could just array_shift() and array_unshift() span tags on each substring and implode() the array.

Along the lines of the coloring, that's your choice, really as it is your function :-D I was just going along with the highlight_string() and highlight_file() functions which color the characters instead of the background.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
Post Reply