Break a text chunk into measured pieces...

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
is_blank
Forum Commoner
Posts: 36
Joined: Sat Jun 25, 2005 6:05 pm
Location: Tennessee, USA

Break a text chunk into measured pieces...

Post by is_blank »

I'm wrestling with this now, and not quite sure how to loop it: I've got a paragraph of text that will change -- might be one sentence once, five sentences next. Whatever it is, I need to break it into chunks no greater than 288 characters. I'd like to break it at sentences, though, instead of just mechanically filling up the 288 characters.

Let's say for the sake of argument that (in this case) the current paragraph is four sentences long, and breaks down like this:
Sentence 1 = 85 chars
Sentence 2 = 179 chars
Sentence 3 = 109 chars
Sentence 4 = 96 chars

Let's call the 288-max-char chunks $field1, $field2, etc. I can run it through something like

$sentences = explode('.', $paragraph);

and get an array...but after that, I'm stymied. I'm guessing I need to say "If $paragraph < 288, then $field1 = $paragraph. OTHERWISE, put $sentences[0] and $sentences[1] in $field 1. If $field1 > 288, then put $sentences[1] in $field2, along with $sentences[2]. If $field2 > 288, then put $sentences[2] into $field3...etc."

That's got to be some kind of crazy recursive kind of looping thing that's a little over my head right now. I don't know what syntax to start with...and I really don't know how to keep track of which $sentences[] I'm working with as it goes along.

I know this has to be a pain in the neck...anyone feel like enlightening me?
:(
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

you definately don't want to explode at the "." because sentences can contain periods (initials, elipses, etc.)

I don't know your purpose for doing this but what I would do is just use a simple substr($paragraph,0,100); substr($paragraph,101,200); etc.
User avatar
is_blank
Forum Commoner
Posts: 36
Joined: Sat Jun 25, 2005 6:05 pm
Location: Tennessee, USA

Post by is_blank »

Ag, true about the explode() -- that would tear up other punctuation, decimal numbers, etc., I suppose --

Substr() would certainly do the job, but I wanted whatever went into $field1, $field2, etc. to stand on its own, specifically by being a complete sentence, or group of sentences. I certainly don't want it to cut off in the middle of a word or anything, as substr() would do... I'll keep poking at it.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Maybe I'm biased but I think for something like this, regex is the way to go. Say for example, you split at MAX 288 chars but don't cut a word in half (I wrote this regex recently) (that's doable). Now we'll advnace on that and say, try to split at a dot "." followed by some whitespace, followed by uppercase letter or a number (that's the flaky bit). It's tricky because the sentence might look like it's ended to a non-human even if it hasn't.
The equation of time was written by Carruthers Et. Al a long time ago.
That regex might plain confuse people so if you need help... ;)
User avatar
is_blank
Forum Commoner
Posts: 36
Joined: Sat Jun 25, 2005 6:05 pm
Location: Tennessee, USA

Post by is_blank »

That regex might plain confuse people...
Mmmhmm. I'm sure it would me. :D

Here's what I think I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will usually fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($strinit would me.  

Here's what I think I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will [i]usually[/i] fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr($string, 0, $last_sentence+1);
  return $final_para;
}
Just ually[/i] fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_lim I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will [i]usually[/i] fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string...[/quote]
Mmmhmm. I'm sure it would me.  

Here's what I think I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will [i]usually[/i] fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limitThat regex might plain confuse people...[/quote]
Mmmhmm. I'm sure it would me.  

Here's what I think I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will [i]usually[/i] fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $lasThat regex might plain confuse people...[/quote]
Mmmhmm. I'm sure it would me.  

Here's what I think I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will [i]usually[/i] fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr:D 

Here's what I think I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will [i]usually[/i] fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr($string, 0, $last_sentence+1);
  return $final_para;
}
Just out of curiosity, is there a way I could write that function without trickling through three variables like that? Or is that it?I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will usually fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) lain confuse people...[/quote]
Mmmhmm. I'm sure it would me.  

Here's what I think I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will [i]usually[/i] fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr($string, 0, $last_sentence+1);
  return $final_para;
}
Just out of curiosity, is there a way I could write that function without trickliimple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr($string, 0, $last_sentence+1);
  return $final_para;
}
Just out of curiosity, is there a way I could write that function without trickling through three variables like that? Or is that it?uote]
Mmmhmm. I'm sure it would me. :D

Here's what I think I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will usually fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr($string, 0, $last_sentence+1);
  return $final_para;
}
Just out of curiosity, is there a way I could write that function without trickling through three variables like that? Or is that it?ay), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr($string, 0, $last_sentence+1);
  return $final_para;
}
Just out of curiosity, is there a way I could write that function without trickling through three variables like that? Or is that it?h-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will usually fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr($string, 0, $last_sentenmm. I'm sure it would me.  

Here's what I think I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will [i]usually[/i] fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr($string, 0, $last_sentence+1);
  return $final_para;
}
That regex might plain confuse people...[/quote]
Mmmhmm. I'm sure it would me. :D

Here's what I think I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will usually fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr($string, 0, $last_sentence+1);
  return $final_para;
}
Just out of curiosity, is there a way I could write that function without trickling through three variables like that? Or is that it?5c]
Mmmhmm. I'm sure it would me. :D

Here's what I think I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will usually fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr($string, 0, $last_sentence+1);
  return $final_para;
}
Just out of curiosity, is there a way I could write that function without trickling through three variable come in over 288 characters (the text I'm working with will usually fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr($string, 0, $last_sentence+1);
  return $final_para;
}
Just out of curiosity, is there a way I could write that function without trickling through three variables like that? Or is that it?That regex might plain confuse people...[/quote]
Mmmhmm. I'm sure it would me. :D

Here's what I think I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will usually fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr($string, 0, $last_sentence+1);
  return $final_para;
}
Just out of curiosity, is there a way I could write thae people...[/quote]
Mmmhmm. I'm sure it would me. :D

Here's what I think I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will usually fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr($string, 0, $last_sentence+1);
  return $final_para;
}
Just out of curiosity,

Here's what I think I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will usually fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr($string, 0, $last_sentence+1);
  return $final_para;
}
Just out of curs what I think I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will usually fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $strThat regex might plain confuse people...[/quote]
Mmmhmm. I'm sure it would me.  

Here's what I think I'm going to go with-- I've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will [i]usually[/i] fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr($string, 0, $last_sentence+1);
  return $final_para;
}
Just out of curiosity, is tI've decided I don't really much care about the sentences that come in over 288 characters (the text I'm working with will usually fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr($string, 0, $last_sentence+1);
  return $final_para;
}
Just out of curiosity, is there a way I could write that function without trickling through three variables like that? Or is that it? working with will usually fall under, anyway), so something simple like this will do the trick, I think:

Code: Select all

function trim_paragraph($string, $max_len) {
  $string_limit = substr($string, 0, $max_len);
  $last_sentence = strrpos($string, '. ');
  $final_para = substr($string, 0, $last_sentence+1);
  return $final_para;
}
Just out of curiosity, is there a way I co
function trim_paragraph($string, $max_len) {
$string_limit = substr($string, 0, $max_len);
$last_sentence = strrpos($string, '. ');
$final_para = substr($string, 0, $last_sentence+1);
return $final_para;
}


Just out of curiosity, is there a way I could write that function without trickling through three variables like that? Or is that it?
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Thats looks pretty good actually if you aren't worried about splitting in the wrong place, providing there's a dot.

Just for tasters here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, "$1", $text);
	}
	returnding there's a dot.

Just for tasters here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, &quote;$1&quote;, $text);
	}
	return $text;
}
I haven't had a chance to test it but it will attempt to only cut where the period is followed by space then an uppercase letter or number. To beh=288) {
if (strlen($text) > $length) {
$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
$text = preg_replace($pattern, &quote;$1&quote;, $text);
} here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) &gt; $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, "$1"about splitting in the wrong place, providing there's a dot.

Just for tasters here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) &gt; $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+&#1111;A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, &quote;$1&quote;, $text);
	}
 good actually if you aren't worried about splitting in the wrong place, providing there's a dot.

Just for tasters here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, "$1", $text);
	}
	return $text;
}
good actually if you aren't worried about splitting in the wrong place, providing there's a dot.

Just for tasters here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, "$1", $text);
	}
	return $text;
}
I haven't had a chance to test it but it will attempt to oa dot.

Just for tasters here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, "$1", $text);
	}
	return $text;
}
I haven't had a chance to test it but it will attempt to only cut where the period is followed by space then an uppercase letter or number. To be honest, your function looks great anyway :D

EDIT | Changed pattern a bit - wasn't going to work for final sentence before ;)here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, "$you aren't worried about splitting in the wrong place, providing there's a dot.

Just for tasters here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+&#1111;A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, "$1", $text);
	}
	return $text;
}
I haven't had a chance to test it but it will attempt to only cut where the period is followed by space then an uppercase letter or number. To be honest, your function looks great anyway :D

EDIT | Changed patt$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
$text = preg_replace($pattern, "$1", $text);
}
return $text;
}


I haven't had a chance to test it but it will attempt to only cut where the period is followed by space then an uppercase letter or number. To be honest, your function looks great anyway :D

EDIT | Changed pattern a bit - wasn't going to work for final sentence before ;)splitting in the wrong place, providing there's a dot.

Just for tasters here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, "$1", $text);
	}
	return $text;
}
I haven't had a chance to test it but it will attempt to only cut where the period is followed by space then an uppercase letter or number. To be honest, your function looks great anyway :D

EDIT | Changed pattern a bit - wasn't going to work for final sentence before ;);gt; $length) {
$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
$text = preg_replace($pattern, "$1", $text);
}
return $text;
}


I haven't had a chance to test it but it will attempt to only cut where the period is followed by space then an uppercase letter or number. To be honest, your function looks great anyway :D

EDIT | Changed pattern a bit - wasn't going to work for final sentence before ;)ersion of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, "$1", $text);
	}
	return $text;
}
I haven't had a chance to test it but it will attempt to only cut where the period place, providing there's a dot.

Just for tasters here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, "$1", $text);
	}
	return $text;
}
I haven't had a chance to test it but it will attempt to only cut where the period is followed by space then an uppercase letter or numb good actually if you aren't worried about splitting in the wrong place, providing there's a dot.

Just for tasters here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, "$1", $text);
	}
	return $text;
}
I haven't had a chance to test it but it will attempt to only cut where the period is followed by space then an uppercase letter or number. To be honest, your function looks great anyway :D

EDIT | Changed pattern a bit - wasn't going to work for final sentence before ;)the wrong place, providing there's a dot.

Just for tasters here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, "$1", $text);
	}
	return $text;
}
I haven't had a chance to test it but it will attempt to only cut where the period is followed by space then an uppercase letter or number. To be honest, your function looks great anyway :D

EDIT | Changed pattern a bit - wasn't going to work ftting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, "$1", $text);
	}
	return $text;
}
I haven't had a chance to test it but it will attempt to only cut where the period is followed by space then an uppercase letter or number. To be honest, your function looks great anyway :D

EDIT | Changed pattern a bit - wasn't going to work for final sentence before ;) good actually if you aren't worried about splitting in the wrong place, providing there's a dot.

Just for tasters here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, "$1", $text);
	}
	return $text;
}
I haven't had a chance to test it but it will attempt to only cut where the period is followed by space then an uppercase letter or number. To be honest, your function looks great anywayworried about splitting in the wrong place, providing there's a dot.

Just for tasters here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, "$1", $text);
	}
	return $text;
}
I haven't had a chance to test it but it will attempt to only cut where the period is followed by space then an uppercase letter or number. To be honest, your funcdot.

Just for tasters here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, "$1", $text);
	}
	return $text;
}
I haven't had a chance to test it but it will attempt to only cut where the period is followed by space then an uppercase letter or number. To be honest, yst for tasters here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+&#1111;A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, &quote;$1&quote;, $tex good actually if you aren't worried about splitting in the wrong place, providing there's a dot.

Just for tasters here's a modified version of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) &gt; $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+&#1111;A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, &quote;$1&quote;, $text);
	}
	return $text;
}
I haven't had a chance to test it but it will attempt to only cut where the period is followed by space then an uppercase letter or number. To be honest, your functionon of the function I made to handle text shortening without cutting words in half.

Code: Select all

function short_text($text, $length=288) {
	if (strlen($text) > $length) {
		$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
		$text = preg_replace($pattern, "$1", $text);
	}
	return $text;
}
I haven't had a chance to test it but it will attempt to only cut where the period is followed by space then an uppercase letter or number. To be honest, your function looks great anyway :D

EDIT | Changed pattern a bit - wasn't going to work for final sentence before ;)
function short_text($text, $length=288) {
if (strlen($text) > $length) {
$pattern = '/^(.{0,'.$length.'}\\.)(\\s+[A-Z0-9])?.*$/s';
$text = preg_replace($pattern, "$1", $text);
}
return $text;
}


I haven't had a chance to test it but it will attempt to only cut where the period is followed by space then an uppercase letter or number. To be honest, your function looks great anyway :D

EDIT | Changed pattern a bit - wasn't going to work for final sentence before ;)
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Umm... you know what. These wont split it into chunks. They're just gonna take the first chunk (mine especially).

Lemme think the best way to get chunks out of this. preg_match_all() may do quite nicely :)
User avatar
is_blank
Forum Commoner
Posts: 36
Joined: Sat Jun 25, 2005 6:05 pm
Location: Tennessee, USA

Post by is_blank »

That;s fine, though. I'm just going to throw the rest away. Executive decision. :twisted: Of course, academically, it's an interesting problem, I guess... :wink:
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Oh yeah, forgot to mention lol (was busy at work), my modified code is pants :P

Don't use it, it stinks and doesn't work when I tested it ;) I didn't try to re-write it (that's a lie, I spent a minute or two) since your code is good anyway.
User avatar
is_blank
Forum Commoner
Posts: 36
Joined: Sat Jun 25, 2005 6:05 pm
Location: Tennessee, USA

Post by is_blank »

Hm. Here's a kink I'm stuck on, though...I tweaked it a little to give me some more flexibility:

Code: Select all

function trim_desc($string, $offset, $max_len) {
	$string_limit = substr($string, $offset, $max_len);
	$last_sentence = strpos($string, '. ');
	$final_para = substr($string, $offset, $last_sentence+1);
	return $final_para;
}
And now I'm weeding my way through a paragraph of, say, 742 characters. I want to keep each of the chunks this time, so I'm doing something like this:

Code: Select all

if(strlen($fact) < 300) {
	$fact1 = $fact; //no need to mess with it anymore
} else {
	$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
	}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
	$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
	$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
	} else {
	$fact2 = substr($fact,$fact2_offset);
	}
// continue similarly for $fact3, etc.
I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the content I'm working with.

ANYWAY, my problem is that my function is too simplistic: I'm faced with a situation (that I'm sure will occur more than once) where a sentence ends with a word in quotes. I'm working in unicode, so it looks something like this:
This is a sentence that ends with a special &#x201c;word.&#x201d; And then t, $offset, $max_len);
$last_sentence = strpos($string, '. ');
$final_para = substr($string, $offset, $last_sentence+1);
return $final_para;
}


And now I'm weeding my way through a paragraph of, say, 742 characters. I want to keep each of the chunks this time, so I'm doing something like this:

Code: Select all

if(strlen($fact) < 300) {
	$fact1 = $fact; //no need to mess with it anymore
} else {
	$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
	}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
	$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
	$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
	} else {
	$fact2 = substr($fact,$fact2_offset);
	}
// continue similarly for $fact3, etc.
I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the content I'm working with.

ANYWAY, my problem is that my function is too simplistic: I'm faced with a situation (that I'm sure will occur more than once) where a sentence ends with a word in quotes. I'm working in unicode, so it looks something like this:
This is a sentence that ends with a special &#x201c;word.&#x201d; And then the text continues.
When that runs through trim_desc(), it obviously doesn't find a '. ' to mark the end of a sentence. In my test, for some reason, it cut the string after the first &#x20, at 152 characters. (I'm afraid my offset might have something to do with that, but I'll worry about that later.)

Is there a way I could modify my trim_desc() function to catch either a period-space or a period-&#x201d;?300) { //if there's still more than 300 chars left, we'll have to chop it again
$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
} else {
$fact2 = substr($fact,$fact2_offset);
}
// continue similarly for $fact3, etc.


I'm sure the above could be done much more elegantly, like looping around somehow and filing the tle to give me some more flexibility:

Code: Select all

function trim_desc($string, $offset, $max_len) {
	$string_limit = substr($string, $offset, $max_len);
	$last_sentence = strpos($string, '. ');
	$final_para = substr($string, $offset, $last_sentence+1);
	return $final_para;
}
And now I'm weeding my way through a paragraph of, say, 742 characters. I want to keep each of the chunks this time, so I'm doing something like this:

Code: Select all

if(strlen($fact) < 300) {
	$fact1 = $fact; //no need to mess with it anymore
} else {
	$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
	}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
	$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
	$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
	} else {
	$fact2 = substr($fact,$fact2_offset);
	}
// continue similarly for $fact3, etc.
I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the content I'm working with.

ANYWAY, my problem is that my function is too simplistic: I'm faced with a situation (that I'm sure will occur more than once) where a sentence ends with a word in quotes. I'm working in unicode, so it looks something like this:
This is a sentence that ends with a special &#x201c;word.&#x201d; And then the text continues.
When that runsset, $max_len) {
$string_limit = substr($string, $offset, $max_len);
$last_sentence = strpos($string, '. ');
$final_para = substr($string, $offset, $last_sentence+1);
return $final_para;
}


And now I'm weeding my way through a paragraph of, say, 742 characters. I want to keep each of the chunks this time, so I'm doing something like this:

Code: Select all

if(strlen($fact) < 300) {
	$fact1 = $fact; //no need to mess with it anymore
} else {
	$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
	}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
	$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
	$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
	} else {
	$fact2 = substr($fact,$fact2_offset);
	}
// continue similarly for $fact3, etc.
I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the content I'm working with.

ANYWAY, my problem is that my function is too simplistic: I'm faced with a situation (that I'm sure will occur more than once) where a sentence ends with a word in quotes. I'm working in unicode, so it looks something like this:
This is a sentence that ends with a special &#x201c;word.&#x201d; And then the text continues.
When that runs through trim_desc(), it obviously doesn't find a '. ' to mark the end of a sentence. In my test, for some reason, it cut the string after the first &#x20, at 152 characters. (I'm afraid my offset might have something to do with that, but I'll worry about that later.)

Is there a way I could modify mittle to give me some more flexibility:

Code: Select all

function trim_desc($string, $offset, $max_len) {
	$string_limit = substr($string, $offset, $max_len);
	$last_sentence = strpos($string, '. ');
	$final_para = substr($string, $offset, $last_sentence+1);
	return $final_para;
}
And now I'm weeding my way through a paragraph of, say, 742 characters. I want to keep each of the chunks this time, so I'm doing something like this:

Code: Select all

if(strlen($fact) < 300) {
	$fact1 = $fact; //no need to mess with it anymore
} else {
	$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
	}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
	$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
	$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
	} else {
	$fact2 = substr($fact,$fact2_offset);
	}
// continue similarly for $fact3, etc.
I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the content I'm working with.

ANYWAY, my problem is that my function is too simplistic: I'm faced with a situation (that I'm sure will occur more than once) where a sentence ends with a word in quotes. I'm working in unicode, so it looks something like this:
This is a sentence that ends with a special &#x201c;word.&#x2c8f9de0b6]
if(strlen($fact) < 300) {
$fact1 = $fact; //no need to mess with it anymore
} else {
$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
} else {
$fact2 = substr($fact,$fact2_offset);
}
// continue similarly for $fact3, etc.


I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the conten this:

Code: Select all

if(strlen($fact) < 300) {
	$fact1 = $fact; //no need to mess with it anymore
} else {
	$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
	}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
	$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
	$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
	} else {
	$fact2 = substr($fact,$fact2_offset);
	}
// continue similarly for $fact3, etc.
I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the content I'm working with.

ANYWAY, my problem is that my function is too simplistic: I'm faced with a situation (that I'm sure will occur more than once) where a sentence ends with a word in quotes. I'm working in unicode, so it looks something like this:
This is a sentence that ends with a special &#x201c;word.&#x201d; And then the text continues.
When that runs through trim_desc(), it obviously doesn't find a '. ' to mark the end of a sentence. In my test, for some reason, it cut the string after the first &#x20, at 152 characters. (I'm afraid my offset might have something to do with that, but I'll worry about that later.)

Is there a way I could modify my trim_desc() function to catch either a period-space or a period-&#x201d;?

Whew! :(im_desc($string, $offset, $max_len) {
$string_limit = substr($string, $offset, $max_len);
$last_sentence = strpos($string, '. ');
$final_para = substr($string, $offset, $last_sentence+1);
return $final_para;
}


And now I'm weeding my way through a paragraph of, say, 742 characters. I want to keep each of the chunks this time, so I'm doing something like this:

Code: Select all

if(strlen($fact) < 300) {
	$fact1 = $fact; //no need to mess with it anymore
} else {
	$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
	}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
	$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
	$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
	} else {
	$fact2 = substr($fact,$fact2_offset);
	}
// continue similarly for $fact3, etc.
I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the content I'm working with.

ANYWAY, my problem is that my function is too simplistic: I'm faced with a situation (that I'm sure will occur more than once) where a sentence ends with a word in quotes. I'm working in unicode, so it looks something like this:
This is a sentence that ends with a special &#x201c;word.&#x201d; And then the text continues.
When that runs through trim_desc(), it obviously doesn't find a '. ' to mark the end of a sentence. In my test, for some reason, it cut the string after the first &#x20, at 152 characters. (I'm afraid my offset might have something to do with that, but I'll worry about that later.)

Is there a way I could modifring, $offset, $max_len) {
$string_limit = substr($string, $offset, $max_len);
$last_sentence = strpos($string, '. ');
$final_para = substr($string, $offset, $last_sentence+1);
return $final_para;
}


And now I'm weeding my way through a paragraph of, say, 742 characters. I want to keep each of the chunks this time, so I'm doing something like this:

Code: Select all

if(strlen($fact) < 300) {
	$fact1 = $fact; //no need to mess with it anymore
} else {
	$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
	}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
	$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
	$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
	} else {
	$fact2 = substr($fact,$fact2_offset);
	}
// continue similarly for $fact3, etc.
I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the content I'm working with.

ANYWAY, my problem is that my function is too simplistic: I'm faced with a situation (that I'm sure will occur more than once) where a sentence ends with a word in quotes. I'm working in unicode, so it looks something like this:
This is a sentence that ends with a special &#x201c;word.&#x201d; And then the text continues.
When that runs through trim_desc(), it obviously doesn't find a '. ' to mark the end of a sentence. In my test, for some reason, it cut the string after the first &#x20, at 152 characI'm stuck on, though...I tweaked it a little to give me some more flexibility:

Code: Select all

function trim_desc($string, $offset, $max_len) {
	$string_limit = substr($string, $offset, $max_len);
	$last_sentence = strpos($string, '. ');
	$final_para = substr($string, $offset, $last_sentence+1);
	return $final_para;
}
And now I'm weeding my way through a paragraph of, say, 742 characters. I want to keep each of the chunks this time, so I'm doing something like this:

Code: Select all

if(strlen($fact) < 300) {
	$fact1 = $fact; //no need to mess with it anymore
} else {
	$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
	}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
	$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
	$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
	} else {
	$fact2 = substr($fact,$fact2_offset);
	}
// continue similarly for $fact3, etc.
I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the content I'm working with.

ANYWAY, my problem is that my function is too simplistic: I'm faced with a situation (that I'm sure will occur more than once) where a sentence ends with a word in quotes. I'm working in unicode, so it looks something like this:
This is a sentence that ends with a special &#x201c;word.&#ugh...I tweaked it a little to give me some more flexibility:

Code: Select all

function trim_desc($string, $offset, $max_len) {
	$string_limit = substr($string, $offset, $max_len);
	$last_sentence = strpos($string, '. ');
	$final_para = substr($string, $offset, $last_sentence+1);
	return $final_para;
}
And now I'm weeding my way through a paragraph of, say, 742 characters. I want to keep each of the chunks this time, so I'm doing something like this:

Code: Select all

if(strlen($fact) < 300) {
	$fact1 = $fact; //no need to mess with it anymore
} else {
	$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
	}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
	$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
	$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
	} else {
	$fact2 = substr($fact,$fact2_offset);
	}
// continue similarly for $fact3, etc.
I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the content I'm working with.

ANYWAY, my problem is that my function is too simplistic: I'm faced with a situation (that I'm sure will occur more than once) where a sentence ends with a word in quotes. I'm working in unicode, so it looks something like this:
When that runs through trim_desc(), it obviously doesn't find a '. ' to mark the end of a sentence. In my test, for some reason, it cut the string after the first &#x20, at 152 characters. (I'm afraid my offset might have something to do wg, '. ');
$final_para = substr($string, $offset, $last_sentence+1);
return $final_para;
}


And now I'm weeding my way through a paragraph of, say, 742 characters. I want to keep each of the chunks this time, so I'm doing something like this:

Code: Select all

if(strlen($fact) < 300) {
	$fact1 = $fact; //no need to mess with it anymore
} else {
	$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
	}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
	$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
	$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
	} else {
	$fact2 = substr($fact,$fact2_offset);
	}
// continue similarly for $fact3, etc.
I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the content I'm working with.

ANYWAY, my problem is that my function is too simplistic: I'm faced with a situation (that I'm sure will occur more than once) where a sentence ends with a word in quotes. I'm working in unicode, so it looks something like this:
When that runs through trim_desc(), it obviously doesn't find a '. ' to mark the end of a sentence. In my test, for some reason, it cut the string after the first &#x20, at 152 characters. (I'm afraid my;
$last_sentence = strpos($string, '. ');
$final_para = substr($string, $offset, $last_sentence+1);
return $final_para;
}


And now I'm weeding my way through a paragraph of, say, 742 characters. I want to keep each of the chunks this time, so I'm doing something like this:

Code: Select all

if(strlen($fact) < 300) {
	$fact1 = $fact; //no need to mess with it anymore
} else {
	$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
	}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
	$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
	$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
	} else {
	$fact2 = substr($fact,$fact2_offset);
	}
// continue similarly for $fact3, etc.
I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the content I'm working with.

ANYWAY, my problem is that my function is tosome more flexibility:

Code: Select all

function trim_desc($string, $offset, $max_len) {
	$string_limit = substr($string, $offset, $max_len);
	$last_sentence = strpos($string, '. ');
	$final_para = substr($string, $offset, $last_sentence+1);
	return $final_para;
}
And now I'm weeding my way through a paragraph of, say, 742 characters. I want to keep each of the chunks this time, so I'm doing something like this:

Code: Select all

if(strlen($fact) < 300) {
	$fact1 = $fact; //no need to mess with it anymore
} else {
	$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
	}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
	$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
	$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
	} else {
	$fact2 = substr($fact,$fact2_offset);
	}
// continue similarly for $fact3, etc.
I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the content I'm working with.

ANYWAY, my problem is that my function is too simplistic: I'm faced with a situation (that I'm sure will occur more than once) where a sentence ends with a word in quotes. I'm working in unicode, so it looks something like this:
When that runs through trim_desc(), it obviously doesn't find a '. ' to mark the end of a sentence. In my test, for some reason, it cut the string after the first &#x20, at 152 characters. (I'm afraid my offset might have something to do with that, but I'll worry about that later.)

Is there a way I could modify my trim_desc() function to catch either a period-space or a period-&#x201d;?

Whew! :(
$final_para = substr($string, $offset, $last_sentence+1);
return $final_para;
}


And now I'm weeding my way through a paragraph of, say, 742 characters. I want to keep each of the chunks this time, so I'm doing something like this:

Code: Select all

if(strlen($fact) < 300) {
	$fact1 = $fact; //no need to mess with it anymore
} else {
	$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
	}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
	$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
	$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
	} else {
	$fact2 = substr($fact,$fact2_offset);
	}
// continue similarly for $fact3, etc.
I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the content I'm working with.

ANYWAY, my problem is that my function is too simplistic: I'm faced with a situation (that I'm sure will occur more than once) where a sentence ends with a word in quotes. I'm working in unicode, so it looks something like this:
When that runs through trim_desc(), it obviop]
function trim_desc($string, $offset, $max_len) {
$string_limit = substr($string, $offset, $max_len);
$last_sentence = strpos($string, '. ');
$final_para = substr($string, $offset, $last_sentence+1);
return $final_para;
}


And now I'm weeding my way through a paragraph of, say, 742 characters. I want to keep each of the chunks this time, so I'm doing something like this:

Code: Select all

if(strlen($fact) < 300) {
	$fact1 = $fact; //no need to mess with it anymore
} else {
	$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
	}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
	$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
	$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
	} else {
	$fact2 = substr($fact,$fact2_offset);
	}
// continue similarly for $fact3, etc.
I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the content I'm working with.

ANYWAY, my problem is that my function is too simplistic: I'm faced with a situation (that I'm sure will occur more than once) where a sentence ends with a word in quotes. I'm working in unicode, so it looks something like this:
When that runs through trim_desc(), it obviously doesn't find a '. ' to mark the end of a sentence. In my test, for some reason, it cut the string after the first &#x20, at 152 characters. (I'm afraid my offset might have something to do with that,I'm stuck on, though...I tweaked it a little to give me some more flexibility:

Code: Select all

function trim_desc($string, $offset, $max_len) {
	$string_limit = substr($string, $offset, $max_len);
	$last_sentence = strpos($string, '. ');
	$final_para = substr($string, $offset, $last_sentence+1);
	return $final_para;
}
And now I'm weeding my way through a paragraph of, say, 742 characters. I want to keep each of the chunks this time, so I'm doing something like this:

Code: Select all

if(strlen($fact) < 300) {
	$fact1 = $fact; //no need to mess with it anymore
} else {
	$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
	}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
	$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
	$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
	} else {
	$fact2 = substr($fact,$fact2_offset);
	}
// continue similarly for $fact3, etc.
I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the co.I tweaked it a little to give me some more flexibility:

Code: Select all

function trim_desc($string, $offset, $max_len) {
	$string_limit = substr($string, $offset, $max_len);
	$last_sentence = strpos($string, '. ');
	$final_para = substr($string, $offset, $last_sentence+1);
	return $final_para;
}
And now I'm weeding my way through a paragraph of, say, 742 characters. I want to keep each of the chunks this time, so I'm doing something like this:

Code: Select all

if(strlen($fact) < 300) {
	$fact1 = $fact; //no need to mess with it anymore
} else {
	$fact1 = trim_desc($fact, 0, 300); //run trim_desc on the first 300 chars
	}

if(strlen($fact)-strlen($fact1) > 300) { //if there's still more than 300 chars left, we'll have to chop it again
	$fact2_offset = strlen($fact1); //find out where $fact1 left off in $fact
	$fact2 = trim_desc($fact, $fact2_offset, 300); //Peel off the next couple sentences, up to 300 chars.
	} else {
	$fact2 = substr($fact,$fact2_offset);
	}
// continue similarly for $fact3, etc.
I'm sure the above could be done much more elegantly, like looping around somehow and filing the chunks into $fact1, $fact2 ... $factx until $fact was all used up, but I need something that works as soon as possible, and I don't anticipate more than $fact4, based on the content I'm working with.

ANYWAY, my problem is that my function is too simplistic: I'm faced with a situation (that I'm sure will occur more than once) where a sentence ends with a word in quotes. I'm working in unicode, so it looks something like this:
When that runs through trim_desc(), it obviously doesn't find a '. ' to mark the end of a sentence. In my test, for some reason, it cut the string after the first &#x20, at 152 characters. (I'm afraid my offset might have something to do with that, but I'll worry about that later.)

Is there a way I could modify my trim_desc() function to catch either a period-space or a period-&#x201d;?

Whew! :(
Post Reply