Primitive filters

Coding Critique is the place to post source code for peer review by other members of DevNetwork. Any kind of code can be posted. Code posted does not have to be limited to PHP. All members are invited to contribute constructive criticism with the goal of improving the code. Posted code should include some background information about it and what areas you specifically would like help with.

Popular code excerpts may be moved to "Code Snippets" by the moderators.

Moderator: General Moderators

Post Reply
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Primitive filters

Post by alex.barylski »

Because filtering is somewhat important to get right I'd like a quick review of the regex, etc and hopefully any errors are spotted. In addition maybe I missed a simple filter which you might then recommend.

NOTE: These are meant to be primitives nothing really fancy, although I have considered using HTML_Purifier instead of strip_tags. The convention is Filter_X - with X being the characters to filter or remove.

I'm not sure I could consider encoding or escaping as a logical part of this collection of static classes. Something higher level like a Filter_Email is not really nessecary as I use a validator which parses the Email according to RFC standards and MUST match so filtering here would be redundant.

What I am intereted in though is maybe filtering Numerics and not just digits, for instance, is the number a hex value, in which case leading 0x might be allowed. Currency filters would not make sense as those data variables rely on locality as well, which is not part of end goal for this.

Here are my four trivial filters hitherto:

Code: Select all

 class Filter_Alpha implements Filter_Interface{
    public static function filterMe($value)
    {
      return preg_replace('/[^0-9\.\+\-]/', '', $value);  
    }  
  }
 
  class Filter_Html implements Filter_Interface{
    public static function filterMe($value, $safe_tags = null)
    {
      if(is_array($safe_tags)){
        $safe_tags = array_map(create_function('$element', 'return "<".strtolower($element).">";'), $safe_tags);                  
        $safe_tags = implode('', $safe_tags);
      }
      else{
        $safe_tags = '';
      }     
      
      return strip_tags($value, $safe_tags);
    }  
  }
 
  class Filter_Digit implements Filter_Interface{
    public static function filterMe($value)
    {
      return preg_replace('/\d/', '', $value);  
    }  
  }
 
  class Filter_Space implements Filter_Interface{
    public static function filterMe($value)
    {
      return trim($value);  
    }  
  }
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Primitive filters

Post by prometheuzz »

About your current regex: Within a character class, the "normal" regex meta characters don't have any special meaning. Only the ^ and - might need escaping (and the [ and ] themselves, of course). I say "might", because it also depends on where these meta characters occur: the ^ only is a negation meta character if it's placed at the beginning of the character set, else it will just match the character '^'. And - will just match the character '-' if it's placed at the start or at the beginning of the character class, somewhere within the class and it will serve as a range meta character.

Code: Select all

[^ac]   // matches any character except 'a' and 'c'
[a^c]   // matches 'a', '^' or 'c'
[\^ac]  // matches '^', 'a' or 'c'
 
[a-c]   // matches 'a', 'b' or 'c'
[-ac]   // matches '-', 'a' or 'c'
[ac-]   // matches 'a', 'c' or '-'
 
[.+*-]  // matches '.', '+', '*' or '-'
That said, your regex should look like this:

Code: Select all

'/[^0-9.+-]/'
But to get to the "real" topic, you could match non-ascii digits (Hebrew, Chinese etc digits) by using their Unicode values in a character class:

Code: Select all

[\x??-\x??]
I am not too sure what kind of numerical values you want to match/filter, but it sounds rather tricky and error prone, IMO. What I mean is that numerical values are frequently used as "plain strings". Take telephone numbers or serial numbers for example, although they can be, or are, made from digits, they don't hold a "real" numerical value.
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Re: Primitive filters

Post by GeertDD »

Note that by adding a + quantifier to that character class you can speed up the regex a tad. The replacement action will then be triggered for multiple characters when possible.

Code: Select all

/[^0-9.+-]+/
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Primitive filters

Post by alex.barylski »

But to get to the "real" topic, you could match non-ascii digits (Hebrew, Chinese etc digits) by using their Unicode values in a character class
Interesting...didn't even think of that, thanks. :)
I am not too sure what kind of numerical values you want to match/filter, but it sounds rather tricky and error prone, IMO. What I mean is that numerical values are frequently used as "plain strings". Take telephone numbers or serial numbers for example, although they can be, or are, made from digits, they don't hold a "real" numerical value.
I've dropped the idea of supporting advanced filters such as phone, etc.

Now I just want to strip non-alpha and non-digit characters but in a Unicode friendly way.

Does preg_replace support the localized charset when I use generic matchers (lack of a better word on my behalf) such as \d or such???
Post Reply