Page 1 of 1

Strip numbers but only if they're not independent

Posted: Tue Mar 23, 2010 8:03 am
by lwc
Independent meaning numbers after special characters like a dash (e.g. something-1234), a dot (e.g. 1.2.3.4), a ":" (e.g. 11:15) a or just standalone numbers (i.e. something 1234), but not something1234.

I have a function that cleans a search term before searching it (so you wouldn't search for commas, question marks, etc.). But how do I make it remove numbers only if they're not independent? I believe that's what Google somehow does.

Code: Select all

 
function cleanhouse($str) {
    $str = preg_replace('/[\d\\\?\=\+\&\`\~\!\@\#\$\%\^\*\(\)\; \,\.\/\_]/', ' ',$str); // Leaving just words, but missing independent numbers
    $str = trim(preg_replace('/\s\s+/', ' ', $str)); // Cleaning space leftovers from above - would it be faster to run the trim first?
    return $str;
}
 
echo cleanhouse("bla, what's up?"); // bla what's up (good) - BTW, I kept the single quote to avoid the bad search term "what s up"
echo cleanhouse('bla?!'); // bla (good)
echo cleanhouse('bla 1'); // bla (good)
echo cleanhouse('bla1'); // bla (bad)
echo cleanhouse('bla-1'); // bla (bad)
 
Thanks!

Re: Strip numbers but only if they're not independent

Posted: Tue Mar 23, 2010 11:58 am
by Christopher
You can probably just use preg_match_all('/[0-9]*/', $str, $matches) to find all the sequences of digits in the string.

Re: Strip numbers but only if they're not independent

Posted: Tue Mar 23, 2010 11:59 am
by lwc
And what do I do with it?

Meanwhile, I've toyed around with this code:

Code: Select all

 
    $str = preg_replace('/((?<!\s)\d+|\\\|\?|\=|\+|\&|\`|\~|\!|\@|\#|\$|\%|\^|\*|\(|\)|\;|\,|\.|\/|\_)/', ' ',$str);
 
The problem is turns "test 1234" into "test 1" (and doesn't deal with "test 1"). It also cuts "1.2.3.4" and "12:53", and it doesn't deal with dashes.

Re: Strip numbers but only if they're not independent

Posted: Tue Mar 23, 2010 9:29 pm
by ridgerunner
The wording of your request is a bit confusing, but from your example output I gather that you want to remove whole numbers that are not attached to a word. Try this modified version of your script:

Code: Select all

<?php 
function cleanhouse($str) {
    // Convert most punctuation (except '"{}<>:[]-|) to spaces
    $re = '/[\\\\?=+&`~!@#$%^*();,.\/_]+|(?<=^|\s)\d+(?=\s|$)/';
    $str = preg_replace($re, ' ', $str);
    // Consolidate multiple consecutive whitespace chars to a single space
    return trim(preg_replace('/\s{2,}/S', ' ', $str));
}
echo cleanhouse("bla, what's up?"). "\n"; // bla what's up (good)
echo cleanhouse('bla?!')          . "\n"; // bla (good)
echo cleanhouse('bla 1')          . "\n"; // bla (good)
echo cleanhouse('bla1')           . "\n"; // bla1 (good)
echo cleanhouse('bla-1')          . "\n"; // bla-1 (good)
echo cleanhouse('  bla 10 20 30') . "\n"; // bla (good)
echo cleanhouse('  10 bla 20  ')  . "\n"; // bla (good)
?>
Note: Please stop escaping all the metacharacters inside the character class. It hurts my eyes!
:)

Re: Strip numbers but only if they're not independent

Posted: Fri Mar 26, 2010 9:26 am
by lwc
But you used escaping too.

Anyway, read the subject. I want the exact opposite. To strip only non independent numbers (e.g. strip 1234 from test1234).