Shorten text with RegExp

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
visionmaster
Forum Contributor
Posts: 139
Joined: Wed Jul 14, 2004 4:06 am

Shorten text with RegExp

Post by visionmaster »

Hello together,

I would like to shorten a text and did following:

Code: Select all

$this->werbetext  = substr($sesArrFoundї$i]ї'werbetext'], 0 , 360);    
            $this->werbetext .= &quote; ...&quote;;

Works just fine, but if I have a long text like "I am a very, very,very long long long text."
(This example ofcourse doesn't have 360 characters) it gets split in middle of a word like this: "I am a very, very,very long long long te..."

Words shouldn't be shortened. In the above example I would like to have following result: "I am a very, very,very long long long ..."

wordwrap() is not the correct function for my problem or am I overseeing something? I guess RegExp would be the right thing, but how to?

Appreciate your help!
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Moved to regex
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Interesting (untested - this looks to easy to work).....:

Code: Select all

$short_string = preg_replace('/^(.{0,360}?\b)/s', "$1...", $long_string);
EDIT | Oh no... don't use.... please don't. That was bad... very bad :P
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Hmm I was feeling nice so I did the regex and put it in a wrapper function for you ;)

Code: Select all

<?php

function short_text($text, $length=360) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\b).*$/s';
		$text = preg_replace($pattern, "$1...", $text);
	}
	return $text;
}

//Example 1 (affects long strings)
echo '<p<u>Example 1</u></p>';
$long_string = <<<EOD
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Sed scelerisque odio at odio. Nam massa ipsum, egestas id, viverra vitae, porta nec, velit. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Praesent est dui, congue nec, pretium ut, scelerisque a, neque. Nunc imperdiet urna sit amet est. Ut imperdiet eleifend lorem. Integer malesuada, elit id pellentesque blandit, dolor metus faucibus sem, ac dignissim arcu elit id tellus. Cras ornare magna vitae ante. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos hymenaeos. Vivamus massa.

Donec fermentum. Curabitur purus. Nunc non purus. Pellentesque ultrices. Sed venenatis odio nec leo. Nunc lacinia. In ultricies arcu eu justo. Aenean diam tellus, dapibus vitae, congue at, sodales et, lacus. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Pellentesque arcu. Curabitur ut urna non libero iaculis dictum. Donec sodales magna at ipsum. Ut euismod sapien et nibh. Aliquam posuere. Sed id lorem.

Curabitur a mi. Integer consequat ipsum a mauris. Suspendisse potenti. In semper arcu eget mauris. Praesent interdum pellentesque magna. Proin fringilla purus at dui. In sed tellus non urna semper aliquam. Pellentesque luctus. Ut ullamcorper auctor sem. Quisque malesuada neque ut orci. Donec gravida malesuada orci. Donec erat diam, dictum vel, cursus vitae, sagittis et, nulla. Suspendisse potenti. In adipiscing. Phasellus facilisis, turpis sed imperdiet fringilla, erat ligula consequat dolor, eu volutpat elit felis a lectus. In pretium. Nulla facilisi. Nullam lectus. Duis fringilla sapien sit amet augue.

Phasellus purus magna, congue ut, convallis at, fringilla quis, metus. Nulla facilisi. Sed consectetuer volutpat tellus. Donec nunc erat, sagittis eget, consequat pulvinar, lobortis sed, eros. Proin a orci id libero tristique auctor. Donec lacinia magna in neque. Nulla lacus velit, congue ut, feugiat non, fringilla eu, nisl. Aliquam felis. Quisque vitae sem. Fusce nulla arcu, mattis eu, congue congue, facilisis sit amet, urna. Phasellus a diam in arcu vestibulum molestie.

Aliquam enim nunc, dictum vitae, posuere id, mattis vel, pede. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Nunc et arcu a mauris tincidunt pellentesque. Etiam ligula erat, scelerisque et, condimentum non, tempor nec, felis. Phasellus fringilla vehicula lectus. Nulla viverra dolor nec urna. Praesent non odio. Cras neque quam, molestie eget, tempor vel, dapibus eu, elit. Quisque luctus varius ligula. Nullam sodales viverra quam. Aliquam mattis, neque non elementum consectetuer, lectus sem blandit ante, egestas aliquam augue diam eget turpis. Duis vitae felis. Proin egestas ante eu nulla. Nam vehicula.

Mauris in tellus non massa bibendum congue. Curabitur et dolor at nulla elementum cursus. Etiam lorem odio, tincidunt faucibus, egestas nec, fermentum sed, ipsum. Praesent porta. Sed eu mauris consectetuer felis tincidunt cursus. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris vitae enim a odio tempor fermentum. Vestibulum vel risus non velit consectetuer commodo. Morbi velit. Sed auctor diam varius purus. Nunc vel wisi at sapien pellentesque scelerisque. Nullam purus nisl, placerat vel, hendrerit sit amet, dapibus vitae, magna. Nam pellentesque ligula ut ante. Phasellus.
EOD;

$short_string = short_text($long_string); //Cut at last word boundary before 360th char
echo nl2br($short_string);

//Example 2 (Leaves short strings alone)
echo '<p><u>Example 2</u></p>';
$short_string = short_text('foo bar');
echo $short_string;

//Example 3 (Custom length)
echo '<p><u>Example 3</u></p>';
$short_string = short_text($long_string, 50); //Cut at last word boundary before 50th char
echo nl2br($short_string);

?>
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

Although it's a bit offtopic in the regex forum... This is how i retrieve only a snippet from a text in a MySQL column...

Code: Select all

SELECT SUBSTRING_INDEX(content,' ',20) as description
FROM messages
visionmaster
Forum Contributor
Posts: 139
Joined: Wed Jul 14, 2004 4:06 am

Post by visionmaster »

timvw wrote:Although it's a bit offtopic in the regex forum... This is how i retrieve only a snippet from a text in a MySQL column...

Code: Select all

SELECT SUBSTRING_INDEX(content,' ',20) as description
FROM messages
That's a cool MySQL function if you want to return a subtring from a string based on the occurrences of the delimiter ' ' in your example. I actually wanted to display a maximum ob 360 charachters, without chopping of characters of words. @d11wtq solution is just fine for me.

@timvw, nevertheless you reminded me to always look through mysql's manual to check out it's functions. I tend to "forget" that and use php instead. And on-board mysql functions are of course much more faster...
visionmaster
Forum Contributor
Posts: 139
Joined: Wed Jul 14, 2004 4:06 am

Post by visionmaster »

d11wtq wrote:Hmm I was feeling nice so I did the regex and put it in a wrapper function for you ;)
Hi. Lucky that you were in a good mood... ;-)

The RegEx is really simple and very effective, your wrapper function is more than perfect! Thank's a lot. By the way, the spoono.com tutorials are excellent, very short and bring it to the point.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

visionmaster wrote: Hi. Lucky that you were in a good mood... ;-)

The RegEx is really simple and very effective, your wrapper function is more than perfect! Thank's a lot.
No problem... I enjoy the regex one's ;)
visionmaster wrote:By the way, the spoono.com tutorials are excellent, very short and bring it to the point.
Great little tasters indeed... I should point out that they have nothing to do with me...
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

I use

Code: Select all

preg_match('#^\s*(.{60,}?)\s+.*$#s', ...
:P
visionmaster
Forum Contributor
Posts: 139
Joined: Wed Jul 14, 2004 4:06 am

Post by visionmaster »

Jcart wrote:I use

Code: Select all

preg_match('#^\s*(.{60,}?)\s+.*$#s', ...
:P
Oops, that one's not that easy at first glance. Here's how Regexbuddy explains it:

^\s*(.{60,}?)\s+.*$
Assert position at the start of the string «^»
Match a single character that is a "whitespace character" (spaces, tabs, line breaks, etc.) «\s*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below and capture its match into backreference number 1 «(.{60,}?)»
Match any single character that is not a line break character «.{60,}?»
Between 60 and unlimited times, as few times as possible, expanding as needed (lazy) «{60,}?»
Match a single character that is a "whitespace character" (spaces, tabs, line breaks, etc.) «\s+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match any single character that is not a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

Created with RegexBuddy
Post Reply