Page 1 of 1
preg_replace to filter out "short words" [SOLVED]
Posted: Tue Nov 29, 2005 7:51 pm
by tomfra
A piece of cake for those who know RegEx more that I do I guess, and that's just about anyone...
$string = "This and that";
I need to get rid of "and" based on the word length so that:
$FilteredString = "This that";
And I would like to do it with preg_replace. I am sure it's possible. I can do it without preg_replace by exploding the string and then play with the result array but that may be unnecesarily complicated.
Any ideas are welcome!
Tomas
Posted: Tue Nov 29, 2005 7:56 pm
by Nathaniel
So anything shorter than x is deleted?
hmm...
Code: Select all
preg_replace('/[a-z]{1,x}/i', '', $string);
should do it. (untested)
Be sure to replace x with 3 or whatever you need for the max. length.
Edit: this code won't delete the extra space... hold on and I'll edit with one that does.
Edit 2: I'm not sure how to do that, actually. You could put \s after a-z, but then if you had a word one or more characters less than x, it'd delete the space before
and after the word...
Posted: Tue Nov 29, 2005 10:59 pm
by shoebappa
Code: Select all
preg_replace('/\b[a-z]{1,x}\b\s?/i', '', $string);
\b matched a word boundary and \s? will catch an extra space (if it's not the last word) if there is one. Replace the x with the max number, say 3...
Tested in regex coach but could have some other things I didn't consider, like it might remove anything before an ' and after if the before and or after is less than x... If they searched for O'Maley it'd remove the O, or Amy's would remove Amy and s and leave the '.
Posted: Wed Nov 30, 2005 5:49 am
by tomfra
shoebappa,
Your RegEx works, but it has no effect on numbers.
E.g. if $string = '24 hours'; , it will be left as is even if the minimum "word" length is set to 3.
Tomas
Posted: Wed Nov 30, 2005 7:33 am
by AGISB
tomfra wrote:shoebappa,
Your RegEx works, but it has no effect on numbers.
E.g. if $string = '24 hours'; , it will be left as is even if the minimum "word" length is set to 3.
Tomas
It won't filter out capital letter words either.
So use this
Code: Select all
preg_replace('/\b[A-Za-z0-9]{1,x}\b\s?/i', '', $string);
Posted: Wed Nov 30, 2005 7:45 am
by shoebappa
The i modifier ignores case, you could add the 0-9 if you wanted.
Posted: Wed Nov 30, 2005 1:23 pm
by tomfra
Great, that works! Now if I could have one more wish
How should the RegEx be modified if I wanted only letters in the filtered string? I.e. remove any numbers before doing the other filtering.
$x = 3;
$string = "2004 world cup";
should return $FilteredString = "world"; despite that "2004" is longer than $x.
And I almost forgot: Thanks!
Tomas
Posted: Wed Nov 30, 2005 6:08 pm
by shoebappa
Code: Select all
preg_replace('/\b[0-9]+\b\s?|\b[a-z]{1,3}\b\s?/i', '', $string);
Should do it, but wouldn't catch something like 4th
Posted: Wed Nov 30, 2005 6:11 pm
by Burrito
Moved to regex I did.
d11wtq | You did 
Posted: Fri Dec 02, 2005 6:50 am
by tomfra
Thanks again folks!
Tomas