Page 1 of 1

Shorten text with RegExp

Posted: Sat Jun 18, 2005 4:02 am
by visionmaster
Hello together,

I would like to shorten a text and did following:

Code: Select all

$this->werbetext  = substr($sesArrFoundї$i]ї'werbetext'], 0 , 360);    
            $this->werbetext .= &quote; ...&quote;;

Works just fine, but if I have a long text like "I am a very, very,very long long long text."
(This example ofcourse doesn't have 360 characters) it gets split in middle of a word like this: "I am a very, very,very long long long te..."

Words shouldn't be shortened. In the above example I would like to have following result: "I am a very, very,very long long long ..."

wordwrap() is not the correct function for my problem or am I overseeing something? I guess RegExp would be the right thing, but how to?

Appreciate your help!

Posted: Sat Jun 18, 2005 7:15 am
by Chris Corbyn
Moved to regex

Posted: Sat Jun 18, 2005 7:18 am
by Chris Corbyn
Interesting (untested - this looks to easy to work).....:

Code: Select all

$short_string = preg_replace('/^(.{0,360}?\b)/s', "$1...", $long_string);
EDIT | Oh no... don't use.... please don't. That was bad... very bad :P

Posted: Sat Jun 18, 2005 7:50 am
by Chris Corbyn
Hmm I was feeling nice so I did the regex and put it in a wrapper function for you ;)

Code: Select all

<?php

function short_text($text, $length=360) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\b).*$/s';
		$text = preg_replace($pattern, "$1...", $text);
	}
	return $text;
}

//Example 1 (affects long strings)
echo '<p<u>Example 1</u></p>';
$long_string = <<<EOD
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Sed scelerisque odio at odio. Nam massa ipsum, egestas id, viverra vitae, porta nec, velit. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Praesent est dui, congue nec, pretium ut, scelerisque a, neque. Nunc imperdiet urna sit amet est. Ut imperdiet eleifend lorem. Integer malesuada, elit id pellentesque blandit, dolor metus faucibus sem, ac dignissim arcu elit id tellus. Cras ornare magna vitae ante. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos hymenaeos. Vivamus massa.

Donec fermentum. Curabitur purus. Nunc non purus. Pellentesque ultrices. Sed venenatis odio nec leo. Nunc lacinia. In ultricies arcu eu justo. Aenean diam tellus, dapibus vitae, congue at, sodales et, lacus. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Pellentesque arcu. Curabitur ut urna non libero iaculis dictum. Donec sodales magna at ipsum. Ut euismod sapien et nibh. Aliquam posuere. Sed id lorem.

Curabitur a mi. Integer consequat ipsum a mauris. Suspendisse potenti. In semper arcu eget mauris. Praesent interdum pellentesque magna. Proin fringilla purus at dui. In sed tellus non urna semper aliquam. Pellentesque luctus. Ut ullamcorper auctor sem. Quisque malesuada neque ut orci. Donec gravida malesuada orci. Donec erat diam, dictum vel, cursus vitae, sagittis et, nulla. Suspendisse potenti. In adipiscing. Phasellus facilisis, turpis sed imperdiet fringilla, erat ligula consequat dolor, eu volutpat elit felis a lectus. In pretium. Nulla facilisi. Nullam lectus. Duis fringilla sapien sit amet augue.

Phasellus purus magna, congue ut, convallis at, fringilla quis, metus. Nulla facilisi. Sed consectetuer volutpat tellus. Donec nunc erat, sagittis eget, consequat pulvinar, lobortis sed, eros. Proin a orci id libero tristique auctor. Donec lacinia magna in neque. Nulla lacus velit, congue ut, feugiat non, fringilla eu, nisl. Aliquam felis. Quisque vitae sem. Fusce nulla arcu, mattis eu, congue congue, facilisis sit amet, urna. Phasellus a diam in arcu vestibulum molestie.

Aliquam enim nunc, dictum vitae, posuere id, mattis vel, pede. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Nunc et arcu a mauris tincidunt pellentesque. Etiam ligula erat, scelerisque et, condimentum non, tempor nec, felis. Phasellus fringilla vehicula lectus. Nulla viverra dolor nec urna. Praesent non odio. Cras neque quam, molestie eget, tempor vel, dapibus eu, elit. Quisque luctus varius ligula. Nullam sodales viverra quam. Aliquam mattis, neque non elementum consectetuer, lectus sem blandit ante, egestas aliquam augue diam eget turpis. Duis vitae felis. Proin egestas ante eu nulla. Nam vehicula.

Mauris in tellus non massa bibendum congue. Curabitur et dolor at nulla elementum cursus. Etiam lorem odio, tincidunt faucibus, egestas nec, fermentum sed, ipsum. Praesent porta. Sed eu mauris consectetuer felis tincidunt cursus. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris vitae enim a odio tempor fermentum. Vestibulum vel risus non velit consectetuer commodo. Morbi velit. Sed auctor diam varius purus. Nunc vel wisi at sapien pellentesque scelerisque. Nullam purus nisl, placerat vel, hendrerit sit amet, dapibus vitae, magna. Nam pellentesque ligula ut ante. Phasellus.
EOD;

$short_string = short_text($long_string); //Cut at last word boundary before 360th char
echo nl2br($short_string);

//Example 2 (Leaves short strings alone)
echo '<p><u>Example 2</u></p>';
$short_string = short_text('foo bar');
echo $short_string;

//Example 3 (Custom length)
echo '<p><u>Example 3</u></p>';
$short_string = short_text($long_string, 50); //Cut at last word boundary before 50th char
echo nl2br($short_string);

?>

Posted: Sat Jun 18, 2005 9:45 am
by timvw
Although it's a bit offtopic in the regex forum... This is how i retrieve only a snippet from a text in a MySQL column...

Code: Select all

SELECT SUBSTRING_INDEX(content,' ',20) as description
FROM messages

Posted: Sat Jun 18, 2005 10:47 am
by visionmaster
timvw wrote:Although it's a bit offtopic in the regex forum... This is how i retrieve only a snippet from a text in a MySQL column...

Code: Select all

SELECT SUBSTRING_INDEX(content,' ',20) as description
FROM messages
That's a cool MySQL function if you want to return a subtring from a string based on the occurrences of the delimiter ' ' in your example. I actually wanted to display a maximum ob 360 charachters, without chopping of characters of words. @d11wtq solution is just fine for me.

@timvw, nevertheless you reminded me to always look through mysql's manual to check out it's functions. I tend to "forget" that and use php instead. And on-board mysql functions are of course much more faster...

Posted: Sat Jun 18, 2005 10:53 am
by visionmaster
d11wtq wrote:Hmm I was feeling nice so I did the regex and put it in a wrapper function for you ;)
Hi. Lucky that you were in a good mood... ;-)

The RegEx is really simple and very effective, your wrapper function is more than perfect! Thank's a lot. By the way, the spoono.com tutorials are excellent, very short and bring it to the point.

Posted: Sat Jun 18, 2005 12:23 pm
by Chris Corbyn
visionmaster wrote: Hi. Lucky that you were in a good mood... ;-)

The RegEx is really simple and very effective, your wrapper function is more than perfect! Thank's a lot.
No problem... I enjoy the regex one's ;)
visionmaster wrote:By the way, the spoono.com tutorials are excellent, very short and bring it to the point.
Great little tasters indeed... I should point out that they have nothing to do with me...

Posted: Sat Jun 18, 2005 2:47 pm
by John Cartwright
I use

Code: Select all

preg_match('#^\s*(.{60,}?)\s+.*$#s', ...
:P

Posted: Sat Jun 18, 2005 5:05 pm
by visionmaster
Jcart wrote:I use

Code: Select all

preg_match('#^\s*(.{60,}?)\s+.*$#s', ...
:P
Oops, that one's not that easy at first glance. Here's how Regexbuddy explains it:

^\s*(.{60,}?)\s+.*$
Assert position at the start of the string «^»
Match a single character that is a "whitespace character" (spaces, tabs, line breaks, etc.) «\s*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below and capture its match into backreference number 1 «(.{60,}?)»
Match any single character that is not a line break character «.{60,}?»
Between 60 and unlimited times, as few times as possible, expanding as needed (lazy) «{60,}?»
Match a single character that is a "whitespace character" (spaces, tabs, line breaks, etc.) «\s+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match any single character that is not a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

Created with RegexBuddy