Page 1 of 1

preg_replace to filter out "short words" [SOLVED]

Posted: Tue Nov 29, 2005 7:51 pm
by tomfra
A piece of cake for those who know RegEx more that I do I guess, and that's just about anyone...

$string = "This and that";

I need to get rid of "and" based on the word length so that:

$FilteredString = "This that";

And I would like to do it with preg_replace. I am sure it's possible. I can do it without preg_replace by exploding the string and then play with the result array but that may be unnecesarily complicated.

Any ideas are welcome!

Tomas

Posted: Tue Nov 29, 2005 7:56 pm
by Nathaniel
So anything shorter than x is deleted?

hmm...

Code: Select all

preg_replace('/[a-z]{1,x}/i', '', $string);
should do it. (untested)

Be sure to replace x with 3 or whatever you need for the max. length.

Edit: this code won't delete the extra space... hold on and I'll edit with one that does.
Edit 2: I'm not sure how to do that, actually. You could put \s after a-z, but then if you had a word one or more characters less than x, it'd delete the space before and after the word...

Posted: Tue Nov 29, 2005 10:59 pm
by shoebappa

Code: Select all

preg_replace('/\b[a-z]{1,x}\b\s?/i', '', $string);
\b matched a word boundary and \s? will catch an extra space (if it's not the last word) if there is one. Replace the x with the max number, say 3...

Tested in regex coach but could have some other things I didn't consider, like it might remove anything before an ' and after if the before and or after is less than x... If they searched for O'Maley it'd remove the O, or Amy's would remove Amy and s and leave the '.

Posted: Wed Nov 30, 2005 5:49 am
by tomfra
shoebappa,

Your RegEx works, but it has no effect on numbers.

E.g. if $string = '24 hours'; , it will be left as is even if the minimum "word" length is set to 3.

Tomas

Posted: Wed Nov 30, 2005 7:33 am
by AGISB
tomfra wrote:shoebappa,

Your RegEx works, but it has no effect on numbers.

E.g. if $string = '24 hours'; , it will be left as is even if the minimum "word" length is set to 3.

Tomas
It won't filter out capital letter words either.

So use this

Code: Select all

preg_replace('/\b[A-Za-z0-9]{1,x}\b\s?/i', '', $string);

Posted: Wed Nov 30, 2005 7:45 am
by shoebappa
The i modifier ignores case, you could add the 0-9 if you wanted.

Posted: Wed Nov 30, 2005 1:23 pm
by tomfra
Great, that works! Now if I could have one more wish ;)

How should the RegEx be modified if I wanted only letters in the filtered string? I.e. remove any numbers before doing the other filtering.


$x = 3;
$string = "2004 world cup";

should return $FilteredString = "world"; despite that "2004" is longer than $x.

And I almost forgot: Thanks!

Tomas

Posted: Wed Nov 30, 2005 6:08 pm
by shoebappa

Code: Select all

preg_replace('/\b[0-9]+\b\s?|\b[a-z]{1,3}\b\s?/i', '', $string);
Should do it, but wouldn't catch something like 4th

Posted: Wed Nov 30, 2005 6:11 pm
by Burrito
Moved to regex I did.

d11wtq | You did :P

Posted: Fri Dec 02, 2005 6:50 am
by tomfra
Thanks again folks!

Tomas