Page 1 of 1

Search result highlighting with excerpts

Posted: Fri Jul 25, 2008 9:50 am
by JayBird
Say i have the folowing string...

Code: Select all

The Tarmac name is one of the most recognised brands in the UK, but few outside our industry realise the breadth of our activities.
 
From our beginnings in the last century, Tarmac has grown to an international operation, providing a wide range of building materials and construction solutions. We are now a market leader, employing 12,500 people in over 500 locations worldwide, with a turnover of £2.1 billion.
 
Of course, we are famous for laying tarmac! Which we do. In a way. But we actually build motorways - from scratch. We quarry on a grand scale. We manufacture and process a wide range of materials for use in all aspects of the construction industry.
A user can search for any term and that term be highlighted. That is easy.

What if the user searched for "Tarmac" and I wanted the results displayed like this, a bit like google...

Search results
The Tarmac name is one of the most recognised....
...in the last century, Tarmac has grown to an international operation, providing...
...we are famous for laying tarmac! Which we do. In a way....

Re: Search result highlighting with excerpts

Posted: Fri Jul 25, 2008 10:18 am
by jayshields
What have you tried? Doesn't seem much more difficult than highlighting the terms to me, although it might be one of those problems that becomes harder once you jump into it and start programming. If you're really stuck, I'll have a shot at it.

Re: Search result highlighting with excerpts

Posted: Fri Jul 25, 2008 10:30 am
by JayBird
Yes, it is exactly one of those things that sounded okay, but got really complicated.

This is what i have currently. It only returns once match

Code: Select all

 
function callback($buffer, $search) {
 
    global $search;
    
    // remove anh html from content
    $string = strip_tags($buffer);
    
    // get the index of the search string
    $search_index = strpos($string, $search);
 
    // define our start point and end point
    $start = $search_index - 20;
    $end = strlen($search) + 40;
    
    // highlight the serach term and return brief page summary
    return preg_replace('|('.quotemeta($search).')|iU', '<strong>\\1</strong>', substr($string, $start, $end));
}
 
echo callback($string, "Tarmac");
 
Try running the string in the first post through that function

Re: Search result highlighting with excerpts

Posted: Fri Jul 25, 2008 12:13 pm
by EverLearning
Try this revised code

Code: Select all

 
function callback($buffer, $search) {
 
    // remove anh html from content
    $string = strip_tags($buffer);
 
    while(($search_index = stripos($string, $search)) !== false) {
 
        // define our start point and end point
        $start = ($search_index - 20) >= 0 ? ($search_index - 20) : 0;
        $end = strlen($search) + 40;
 
    // highlight the serach term and return brief page summary
        $results[] = preg_replace('|('.quotemeta($search).')|iU', '<strong>\\1</strong>', substr($string, $start, $end));
 
        $string = substr($string, $search_index + $end);
    }
 
    return $results;
}
 
$string = "The Tarmac name is one of the most recognised brands in the UK, but few outside our industry realise the breadth of our activities.
 
From our beginnings in the last century, Tarmac has grown to an international operation, providing a wide range of building materials and construction solutions. We are now a market leader, employing 12,500 people in over 500 locations worldwide, with a turnover of £2.1 billion.
 
Of course, we are famous for laying tarmac! Which we do. In a way. But we actually build motorways - from scratch. We quarry on a grand scale. We manufacture and process a wide range of materials for use in all aspects of the construction industry.";
 
var_dump(callback($string, "Tarmac"));
 
and the result is

Code: Select all

array
  0 => string 'The <strong>Tarmac</strong> name is one of the most recognised ' (length=63)
  1 => string 'n the last century, <strong>Tarmac</strong> has grown to an int' (length=63)
  2 => string 'e famous for laying <strong>tarmac</strong>! Which we do. In a ' (length=63)
 
Its needs some work in recognizing the word boundaries, but its a start. This whole thing could probably be done using some fancy regex, and I would like to see that :wink:

Re: Search result highlighting with excerpts

Posted: Fri Jul 25, 2008 12:27 pm
by JayBird
Holy crap EverLearning, that looks the shiznit!!

I will use that in my application for now unless someone else can come up with anything fancier!

EDIT: is your snippet PHP5 only? Dont think PHP4 has stripos()

Re: Search result highlighting with excerpts

Posted: Fri Jul 25, 2008 12:37 pm
by EverLearning
Yes, I made it using PHP5, but you can use this(taken from the php manual)

Code: Select all

if (!function_exists("stripos")) {
  function stripos($str,$needle,$offset=0)
  {
      return strpos(strtolower($str),strtolower($needle),$offset);
  }
}

Re: Search result highlighting with excerpts

Posted: Fri Jul 25, 2008 12:43 pm
by JayBird
Thanks, i will give it a go on Monday when i'm back at work, or over the weekend if im feeling fruity 8)

Re: Search result highlighting with excerpts

Posted: Fri Jul 25, 2008 12:49 pm
by EverLearning
This version additionaly splits the found strings on space char, so you dont get words cut in the middle:

Code: Select all

function callback($buffer, $search) {
 
    // remove anh html from content
    $string = strip_tags($buffer);
 
    while(($search_index = stripos($string, $search)) !== false) {
 
        // define our start point and end point
        $start = ($search_index - 20) >= 0 ? ($search_index - 20) : 0;
        $end = strlen($search) + 40;
 
        $found = substr($string, $start, $end);
        $found = substr($found, strpos($found, ' '), strrpos($found, ' '));
 
    // highlight the serach term and return brief page summary
        $results[] = preg_replace('|('.quotemeta($search).')|iU', '<strong>\\1</strong>', $found);
 
        $string = substr($string, $search_index + $end);
    }
 
    return $results;
}
Result

Code: Select all

array
  0 => string ' <strong>Tarmac</strong> name is one of the most recognised ' (length=60)
  1 => string ' the last century, <strong>Tarmac</strong> has grown to an ' (length=59)
  2 => string ' famous for laying <strong>tarmac</strong>! Which we do. In a ' (length=62)
 

Re: Search result highlighting with excerpts

Posted: Fri Jul 25, 2008 1:37 pm
by JayBird
Ooooh, i like that option! Works great.

One little issue

Change the input string to this

Code: Select all

The Tarmac name is one of the most recognised brands in the UK, but few outside our industry realise the breadth of our activities.
 
From our beginnings in the last century, Tarmac has grown to an international operation, providing a wide range of building materials and construction solutions. We are now a market leader, employing 12,500 people in over 500 locations worldwide, with a turnover of £2.1 billion.
 
Of course, we are famous for laying tarmac! Which we do. In a way. But we actually build motorways - from scratch. We quarry on a grand scale. We manufacture and process a wide range of materials for use in all aspects of the construction industry.
 
Tarmac is in search of managers and leaders of the future. If you are ambitious, resourceful and have a positive approach to what you do, find out more and apply at: http://www.tarmac.co.uk/gradlife
It seems to do something funky with the url at the end.

Any ideas?

Re: Search result highlighting with excerpts

Posted: Fri Jul 25, 2008 2:15 pm
by EverLearning
It was funky at the end beacuse the last found string section had a space at position 4, and last space at position 8(I used spaces to find word boundaries), so all you got was

Code: Select all

at: ht
or somethig like it. I fixed it so that if the last space is before the position of $search(which caused this bug), it will just use the whole string section

Code: Select all

function callback($buffer, $search) {
 
    // remove anh html from content
    $string = strip_tags($buffer);
 
    while(($search_index = stripos($string, $search)) !== false) {
 
        // define our start point and end point
        $start = ($search_index - 20) >= 0 ? ($search_index - 20) : 0;
        $end = strlen($search) + 40;
 
        $found = substr($string, $start, $end);
 
        $begin = strpos($found, ' ');
        $finish = strrpos($found, ' ');
        $finish = (stripos($found, $search) > $finish) ? $start - $end : $finish ;
 
        $found = substr($found, $begin, $finish);
 
    // highlight the serach term and return brief page summary
        $results[] = preg_replace('|('.quotemeta($search).')|iU', '<strong>\\1</strong>', $found);
 
        $string = substr($string, $search_index + $end);
    }
 
    return $results;
}

Re: Search result highlighting with excerpts

Posted: Fri Jul 25, 2008 2:27 pm
by JayBird
Sweeeeeeet, working really well

Im guessing it would be really complicated to expand this to multi-word searches?

Re: Search result highlighting with excerpts

Posted: Fri Jul 25, 2008 5:51 pm
by EverLearning
If by multi-word you mean search like "sql query" this function will find lines where this two words are side by side. I tested it on Maugrim's tutorial :D here on the forum

Code: Select all

$content = file_get_contents('http://forums.devnetwork.net/viewtopic.php?f=28&t=48499');
 
var_dump(callback($content, "SQL query"));


But if you want multi-word search where words don't have to be adjacent to one another, you're better off with solutions like Zend_Search_Lucene and similar.

Re: Search result highlighting with excerpts

Posted: Sat Jul 26, 2008 6:07 pm
by RobertGonzalez
When I get to work on Monday I will search through some code I wrote that does this exact thing with multi word search highlighting.

Re: Search result highlighting with excerpts

Posted: Sun Jul 27, 2008 2:10 pm
by jayshields
Nice work EverLearning, I will probably find use for that function some time in the future!

Re: Search result highlighting with excerpts

Posted: Sun Jul 27, 2008 4:43 pm
by EverLearning
I just played a little with his original code ;)