PHP Developers Network
http://forums.devnetwork.net/

Excerpt Function
http://forums.devnetwork.net/viewtopic.php?f=50&t=121729
Page 1 of 1

Author:  McInfo [ Tue Sep 28, 2010 12:16 pm ]
Post subject:  Excerpt Function

This function is a response to topic .
Syntax: [ Download ] [ Hide ]
<?php
/**
 * Returns an excerpt from the beginning of a string. An attempt is made to
 * return whole words only. However, if the beginning of the string up to the
 * maximum length consists of all word characters, the string is truncated to
 * the maximum length.
 *
 * @param string $string
 *     The input string. Multi-line strings are supported.
 *
 * @param integer $maxLength
 *     The maximum length of the excerpt, not including the suffix. The actual
 *     length of the returned string may be shorter (to avoid cutting a word) or
 *     longer (because of the suffix).
 *
 * @param integer $minLength
 *     The minimum length of the excerpt. Word boundaries occurring before this
 *     character position are ignored. Defaults to 0.
 *
 * @param string $suffix
 *     What to append to the excerpt if the input string is longer than the
 *     maximum length or if forced. By default, an ellipsis ("...").
 *
 * @param boolean $forceSuffix
 *     If true, the suffix is appended even if the input string is shorter than
 *     the maximum length. False by default.
 *
 * @return string
 *     An excerpt of the input string, or null if pattern matching failed.
 *     Exceptions are thrown if the function arguments are incongruent.
 */

function excerpt ($string, $maxLength, $minLength = 0, $suffix = '...', $forceSuffix = false) {
    if ($maxLength < 0) {
        throw new Exception('Required: $maxLength >= 0');
    }
    if ($minLength < 0) {
        throw new Exception('Required: $minLength >= 0');
    }
    if ($maxLength < $minLength) {
        throw new Exception('Required: $minLength <= $maxLength');
    }
    $strlen = strlen($string);
    if ($strlen <= $maxLength) {
        return $string . ($forceSuffix ? $suffix : '');
    }
    $pattern = sprintf('/\A(.{%1$u,%2$u}(?!\w)|.{0,%2$u})/s', $minLength, $maxLength);
    preg_match($pattern, $string, $matches);
    $excerpt = $matches[1];
    if ($strlen > $maxLength || $forceSuffix) {
        $excerpt .= $suffix;
    }
    return $excerpt;
}

// Usage example
$text = 'The quick brown fox jumped over the lazy dog.';
try {
    var_dump(excerpt($text, 20, 10)); // string(22) "The quick brown fox..."
} catch (Exception $e) {
    echo $e->getMessage();
}

I hope I explained everything well enough in the comments. Suggestions for improvements are welcome. Would adding offset and prefix parameters be excessive? Should I trigger warnings and return null instead of throwing exceptions? Should I consider using mb_strlen()?

Edit: Amended third exception message, added "=".

Author:  Jonah Bron [ Tue Sep 28, 2010 12:45 pm ]
Post subject:  Re: Excerpt Function

8O
That's pretty slick. That regex statement is a tough one... I didn't even think of what would happen if the max length is bigger than the size of the input string.

Overall, I think throwing exceptions is better than a more "graceful" failure because it forces the developer to debug.

Author:  McInfo [ Tue Sep 28, 2010 1:41 pm ]
Post subject:  Re: Excerpt Function

After the sprintf() substitution has taken place, the regular expression is a little less intimidating. Here is an explanation:
Syntax: [ Download ] [ Hide ]
'/\A(.{10,20}(?!\w)|.{0,20})/s' regex string
'                             ' string bounds
 /                          /   regex bounds
                             s  dotall modifier (so dot matches newline)
  \A                            start of subject
    (                      )    subpattern
                   |            "or" branch in subpattern
     .{10,20}(?!\w)             min 10, max 20 of any character not a followed by a word character
     .                          any character (including newline because of s modifier)
      {10,20}                   quantifier, minimum 10, maximum 20
             (?!  )             negative lookahead assertion
                \w              any word character
                    .{0,20}     min 0, max 20 of any character
                    .           any character (including newline because of s modifier)
                     {0,20}     quantifier, minimum 0, maximum 20

Author:  John Cartwright [ Tue Sep 28, 2010 2:32 pm ]
Post subject:  Re: Excerpt Function

I've always used this regex, written by feyd way back when (currently set to 60 chars).

Syntax: [ Download ] [ Hide ]
#^\s*(.{60,}?)\s+.*$#s


.. which will grab the first 60 chars, and continue until it has found a space (meaning no chopped words).

Author:  Jonah Bron [ Tue Sep 28, 2010 3:07 pm ]
Post subject:  Re: Excerpt Function

The advantage of McInfo's solution is it goes backward instead of forward, so you can be sure it will be shorter than the given number. Than regex is pretty neat though. Perhaps it could be modified to go backward?

Page 1 of 1 All times are UTC - 5 hours
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/