preg_replace to filter out "short words" [SOLVED]

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
tomfra
Forum Contributor
Posts: 126
Joined: Wed Jun 23, 2004 12:56 pm
Location: Prague, Czech Republic

preg_replace to filter out "short words" [SOLVED]

Post by tomfra »

A piece of cake for those who know RegEx more that I do I guess, and that's just about anyone...

$string = "This and that";

I need to get rid of "and" based on the word length so that:

$FilteredString = "This that";

And I would like to do it with preg_replace. I am sure it's possible. I can do it without preg_replace by exploding the string and then play with the result array but that may be unnecesarily complicated.

Any ideas are welcome!

Tomas
Last edited by tomfra on Fri Dec 02, 2005 6:50 am, edited 1 time in total.
User avatar
Nathaniel
Forum Contributor
Posts: 396
Joined: Wed Aug 31, 2005 5:58 pm
Location: Arkansas, USA

Post by Nathaniel »

So anything shorter than x is deleted?

hmm...

Code: Select all

preg_replace('/[a-z]{1,x}/i', '', $string);
should do it. (untested)

Be sure to replace x with 3 or whatever you need for the max. length.

Edit: this code won't delete the extra space... hold on and I'll edit with one that does.
Edit 2: I'm not sure how to do that, actually. You could put \s after a-z, but then if you had a word one or more characters less than x, it'd delete the space before and after the word...
User avatar
shoebappa
Forum Contributor
Posts: 158
Joined: Mon Jul 11, 2005 9:14 pm
Location: Norfolk, VA

Post by shoebappa »

Code: Select all

preg_replace('/\b[a-z]{1,x}\b\s?/i', '', $string);
\b matched a word boundary and \s? will catch an extra space (if it's not the last word) if there is one. Replace the x with the max number, say 3...

Tested in regex coach but could have some other things I didn't consider, like it might remove anything before an ' and after if the before and or after is less than x... If they searched for O'Maley it'd remove the O, or Amy's would remove Amy and s and leave the '.
tomfra
Forum Contributor
Posts: 126
Joined: Wed Jun 23, 2004 12:56 pm
Location: Prague, Czech Republic

Post by tomfra »

shoebappa,

Your RegEx works, but it has no effect on numbers.

E.g. if $string = '24 hours'; , it will be left as is even if the minimum "word" length is set to 3.

Tomas
AGISB
Forum Contributor
Posts: 422
Joined: Fri Jul 09, 2004 1:23 am

Post by AGISB »

tomfra wrote:shoebappa,

Your RegEx works, but it has no effect on numbers.

E.g. if $string = '24 hours'; , it will be left as is even if the minimum "word" length is set to 3.

Tomas
It won't filter out capital letter words either.

So use this

Code: Select all

preg_replace('/\b[A-Za-z0-9]{1,x}\b\s?/i', '', $string);
User avatar
shoebappa
Forum Contributor
Posts: 158
Joined: Mon Jul 11, 2005 9:14 pm
Location: Norfolk, VA

Post by shoebappa »

The i modifier ignores case, you could add the 0-9 if you wanted.
tomfra
Forum Contributor
Posts: 126
Joined: Wed Jun 23, 2004 12:56 pm
Location: Prague, Czech Republic

Post by tomfra »

Great, that works! Now if I could have one more wish ;)

How should the RegEx be modified if I wanted only letters in the filtered string? I.e. remove any numbers before doing the other filtering.


$x = 3;
$string = "2004 world cup";

should return $FilteredString = "world"; despite that "2004" is longer than $x.

And I almost forgot: Thanks!

Tomas
User avatar
shoebappa
Forum Contributor
Posts: 158
Joined: Mon Jul 11, 2005 9:14 pm
Location: Norfolk, VA

Post by shoebappa »

Code: Select all

preg_replace('/\b[0-9]+\b\s?|\b[a-z]{1,3}\b\s?/i', '', $string);
Should do it, but wouldn't catch something like 4th
Last edited by shoebappa on Wed Nov 30, 2005 6:25 pm, edited 1 time in total.
User avatar
Burrito
Spockulator
Posts: 4715
Joined: Wed Feb 04, 2004 8:15 pm
Location: Eden, Utah

Post by Burrito »

Moved to regex I did.

d11wtq | You did :P
tomfra
Forum Contributor
Posts: 126
Joined: Wed Jun 23, 2004 12:56 pm
Location: Prague, Czech Republic

Post by tomfra »

Thanks again folks!

Tomas
Post Reply