Excerpt Function

Coding Critique is the place to post source code for peer review by other members of DevNetwork. Any kind of code can be posted. Code posted does not have to be limited to PHP. All members are invited to contribute constructive criticism with the goal of improving the code. Posted code should include some background information about it and what areas you specifically would like help with.

Popular code excerpts may be moved to "Code Snippets" by the moderators.

Moderator: General Moderators

Post Reply
User avatar
McInfo
DevNet Resident
Posts: 1532
Joined: Wed Apr 01, 2009 1:31 pm

Excerpt Function

Post by McInfo »

This function is a response to topic 121700.

Code: Select all

<?php
/**
 * Returns an excerpt from the beginning of a string. An attempt is made to
 * return whole words only. However, if the beginning of the string up to the
 * maximum length consists of all word characters, the string is truncated to
 * the maximum length.
 *
 * @param string $string
 *     The input string. Multi-line strings are supported.
 *
 * @param integer $maxLength
 *     The maximum length of the excerpt, not including the suffix. The actual
 *     length of the returned string may be shorter (to avoid cutting a word) or
 *     longer (because of the suffix).
 *
 * @param integer $minLength
 *     The minimum length of the excerpt. Word boundaries occurring before this
 *     character position are ignored. Defaults to 0.
 *
 * @param string $suffix
 *     What to append to the excerpt if the input string is longer than the
 *     maximum length or if forced. By default, an ellipsis ("...").
 *
 * @param boolean $forceSuffix
 *     If true, the suffix is appended even if the input string is shorter than
 *     the maximum length. False by default.
 *
 * @return string
 *     An excerpt of the input string, or null if pattern matching failed.
 *     Exceptions are thrown if the function arguments are incongruent.
 */
function excerpt ($string, $maxLength, $minLength = 0, $suffix = '...', $forceSuffix = false) {
    if ($maxLength < 0) {
        throw new Exception('Required: $maxLength >= 0');
    }
    if ($minLength < 0) {
        throw new Exception('Required: $minLength >= 0');
    }
    if ($maxLength < $minLength) {
        throw new Exception('Required: $minLength <= $maxLength');
    }
    $strlen = strlen($string);
    if ($strlen <= $maxLength) {
        return $string . ($forceSuffix ? $suffix : '');
    }
    $pattern = sprintf('/\A(.{%1$u,%2$u}(?!\w)|.{0,%2$u})/s', $minLength, $maxLength);
    preg_match($pattern, $string, $matches);
    $excerpt = $matches[1];
    if ($strlen > $maxLength || $forceSuffix) {
        $excerpt .= $suffix;
    }
    return $excerpt;
}

// Usage example
$text = 'The quick brown fox jumped over the lazy dog.';
try {
    var_dump(excerpt($text, 20, 10)); // string(22) "The quick brown fox..."
} catch (Exception $e) {
    echo $e->getMessage();
}
I hope I explained everything well enough in the comments. Suggestions for improvements are welcome. Would adding offset and prefix parameters be excessive? Should I trigger warnings and return null instead of throwing exceptions? Should I consider using mb_strlen()?

Edit: Amended third exception message, added "=".
Last edited by McInfo on Tue Sep 28, 2010 1:52 pm, edited 1 time in total.
User avatar
Jonah Bron
DevNet Master
Posts: 2764
Joined: Thu Mar 15, 2007 6:28 pm
Location: Redding, California

Re: Excerpt Function

Post by Jonah Bron »

8O
That's pretty slick. That regex statement is a tough one... I didn't even think of what would happen if the max length is bigger than the size of the input string.

Overall, I think throwing exceptions is better than a more "graceful" failure because it forces the developer to debug.
User avatar
McInfo
DevNet Resident
Posts: 1532
Joined: Wed Apr 01, 2009 1:31 pm

Re: Excerpt Function

Post by McInfo »

After the sprintf() substitution has taken place, the regular expression is a little less intimidating. Here is an explanation:

Code: Select all

'/\A(.{10,20}(?!\w)|.{0,20})/s' regex string
'                             ' string bounds
 /                          /   regex bounds
                             s  dotall modifier (so dot matches newline)
  \A                            start of subject
    (                      )    subpattern
                   |            "or" branch in subpattern
     .{10,20}(?!\w)             min 10, max 20 of any character not a followed by a word character
     .                          any character (including newline because of s modifier)
      {10,20}                   quantifier, minimum 10, maximum 20
             (?!  )             negative lookahead assertion
                \w              any word character
                    .{0,20}     min 0, max 20 of any character
                    .           any character (including newline because of s modifier)
                     {0,20}     quantifier, minimum 0, maximum 20
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: Excerpt Function

Post by John Cartwright »

I've always used this regex, written by feyd way back when (currently set to 60 chars).

[text]#^\s*(.{60,}?)\s+.*$#s[/text]

.. which will grab the first 60 chars, and continue until it has found a space (meaning no chopped words).
User avatar
Jonah Bron
DevNet Master
Posts: 2764
Joined: Thu Mar 15, 2007 6:28 pm
Location: Redding, California

Re: Excerpt Function

Post by Jonah Bron »

The advantage of McInfo's solution is it goes backward instead of forward, so you can be sure it will be shorter than the given number. Than regex is pretty neat though. Perhaps it could be modified to go backward?
Post Reply