Page 1 of 1

Smartest filter/validating PHP code

Posted: Fri Mar 23, 2007 3:51 am
by alex.barylski
Looked at Zend and although neat, overkill for my own purposes :)

I need more of a validation solution than a filtering anyways :)

Meaning I will test for validity and fail to continue if it's bad. I guess, except when validating TEXTAREA in which case it's prolly a lot faster/easier to just strip HTML or convert to entities :|

Didn't think of that one until now :P

Anyways, smartest, leanest filtering solution...why not just store an array of regex in a hash array and pull that regex into a preg_* and be done with it???

Any other techniques or solutions (other than Zend or Regex) which you feel are worth mentioning?

Posted: Fri Mar 23, 2007 3:53 am
by Oren
Hmm... write your own? :P

Posted: Fri Mar 23, 2007 3:55 am
by alex.barylski
Thats kinda the point, but I wanted to steal ideas from others, learning from others mistakes, etc ;)

Posted: Fri Mar 23, 2007 3:56 am
by Oren
Have you googled it yet?

Posted: Fri Mar 23, 2007 3:58 am
by Kieran Huggins
I use regex on the client side first - works like a charm for helping users input valid content.

Two steps on the server: validate per field type (email, phone, etc...) then escape unwanted characters. A data class / active record seems ideal for this to me..

The O'Reilly book "Building Scalable Web Sites" has an excellent section on input filtering/validation. PM me for an excerpt if you're interested.

Posted: Fri Mar 23, 2007 6:39 am
by Maugrim_The_Reaper
Any other techniques or solutions (other than Zend or Regex) which you feel are worth mentioning?
The Zend Framework is under fire for its solution - the problem is that it's not a great solution, and they need to implement one in the near future. I get the distinct feeling myself and several others are less than impressed with a 1.0 release which has such poor input filter/validation control. The problem is that you *MUST* use $_POST in the Zend Framework. I was originally under the impression you could use Zend_Controller_Request_Http but this has been shot down as an unintentional user interface (it's only intended for use internally in the framework). This means the ZF effectively has no reliable userland Request object which is news for me and pretty damn worrying since it means any future userland Request object would require some obvious refactoring work at a time when the API must be stable.

Moaning about the ZF aside...;)

JCart opened a thread recently on the same subject which would be worth reading, not sure if he has public code or not but it has a few pointers for an OO validator chain. The gist of the thread was to have an interface which lends itself to configuration - i.e. you don't have to lump in lines of validation code for every single controller action (probably the most boring and essential part of controller writing) if all it takes is a configuration file, a Factory, and some automagic.
Anyways, smartest, leanest filtering solution...why not just store an array of regex in a hash array and pull that regex into a preg_* and be done with it???
An OO solution would be as simple - you'd just be replacing the procedural regex passes with a filter chain composed of individual rules, one of which can be a generic Regex rule accepting any Regex string you want. One of the powers of config based chaining is the config file would hold the Regex to be used at runtime.

Posted: Fri Mar 23, 2007 1:32 pm
by John Cartwright
Indeed, I have been on this quest for a versatile validation system. Right now I've mocked up a pretty solid validator using YAML configuration which is factorized into validation objects. Everything is automated, and all validation can be done in just a couple lines of code in the controller.

Code: Select all

//intiailize validator and pass request data
$validation = new Northern_Validator($this->getInvokeArgs());
$validation->loadRuleset('index/send');

if ($validation->isValid()) {
   // do stuff
}
and /index/send.yaml would look somethng like

Code: Select all

method: post
fields:
  foo:
    required: true
    validator:
      string:
         min: 3
         min_error: foo is not long enough (3 char minimum)
         max: 5
         max_error: foo too long (5 char maximum)
  bar:
    required: false
    validator:
      string:
         min: 3
         min_error: foo is not long enough (3 char minimum)
This is all tightly based on the Symphony Framework. More advanced implementations still need to be worked out, although this, in my opinion, seems to be the easiest way to process validations. It really is an interesting feat.

Definitely worth looking into.

Posted: Fri Mar 23, 2007 3:54 pm
by alex.barylski
YAML = Yet another markup language? :P

Anyways, it looks alot like JSON...

I wonder if you could simply define a JSON validator and include that in PHP and use it's unserialized object interface...

That is how PHP serializes it's objects correct? In JSON format?

Doesn't exactly do wonders for cross language support though :S

Maybe avoid parsing YAML thought :P

Posted: Fri Mar 23, 2007 3:58 pm
by John Cartwright
Hockey wrote:YAML = Yet another markup language? :P
http://www.yaml.org/ wrote:YAML Ain't Markup Language
:)

Posted: Fri Mar 23, 2007 4:14 pm
by alex.barylski
Ugh...there is so little innovation in the linux world :P

Yet another recursive acronym like GNU, etc... :lol:

Just kidding... :)

Posted: Fri Mar 23, 2007 4:17 pm
by Kieran Huggins
http://ca3.php.net/manual/en/ref.json.php

It's not as verbose as serialize() as javascript has fewer structure types - an object and an associative array are both just a hash in javascript, for instance. PHP differentiates between them, as does serialize();

You'll enjoy this: http://javascript.crockford.com/survey.html (from the creator of JSON himself)

Posted: Fri Mar 23, 2007 5:12 pm
by Maugrim_The_Reaper
JSON is almost (small spacing difference for the : simple key) a valid subset of YAML - YAML however has a few tricks up it's sleeve since it's not just object notation ;). I'm currently writing a YAML parser for the ZF so it's an interesting topic since YAML is pretty common in other programming languages like Ruby. Just hasn't quite managed to capture a PHP following yet.

Posted: Fri Mar 23, 2007 10:06 pm
by dreamscape
I've written a small YAML parser. Currently it only implements a tiny subset of the YAML Specification, because that's all I need, but it could be used as a stepping stone if anyone was interested in implementing more of the YAML Specification:

Yaml Node:

Code: Select all

<?php

class YamlNode implements ArrayAccess, IteratorAggregate
{

	/**
	 * Name $___children to prevent collision with any children actually called "children"
	 */
	protected $___children = array();

	public function __construct($yaml = null)
	{
		if ($yaml !== null)
		{
			$this->assimilateData($yaml);
		}
	}


	public function children()
	{
		return $this->___children;
	}


	public function addChildNode($name)
	{
		if (!isset($this->___children[$name]))
		{
			$this->___children[$name] = new self();
		}

		return $this->___children[$name];
	}


	public function addKeyValuePair($key, $value)
	{
 		$this->___children[$key] = $value;
	}


	public function assimilateData($yaml)
	{
		$parser = new YamlParser($this);
		$parser->assimilateData($yaml);
	}



	/**
	 * for IteratorAggregate
	 */
	public function getIterator()
	{
		return new ArrayIterator($this->___children);
	}


	/**
	 * for ArrayAccess
	 * {{{
	 */
	public function offsetExists($offset)
	{
		if (isset($this->___children[$offset]))
		{
			return true;
		}

		return false;
	}

	public function offsetGet($offset)
	{
		return $this->___children[$offset];
	}

	public function offsetSet($offset, $value) {
		$this->___children[$offset] = $value;
	}

	public function offsetUnset($offset)
	{
		if (isset($this->___children[$offset]))
		{
			unset($this->___children[$offset]);
		}
	}
	/**
	 * }}}
	 */


	/**
	 * Magic methods
	 * {{{
	 */
	private function __get($property)
	{
		if (!isset($this->___children[$property]))
		{
			return;
		}

		return $this->___children[$property];
	}

	private function __isset($property)
	{
		if (isset($this->___children[$property]))
		{
			return true;
		}

		return false;
	}
	/**
	 * }}}
	 */

}
Yaml Parser:

Code: Select all

class YamlParser
{

 	protected $nodes = array();

	public function __construct(YamlNode $root)
	{
		$this->nodes = array(0 => $root);
	}


	public function assimilateData($yaml)
	{
		$file = $this->getDataLines($yaml);

		foreach ($file as $line)
		{
			if ($this->isComment($line))
			{
				continue;
			}

			if ($this->isChildNode($line))
			{
				$this->assimilateChildNode($line);
				continue;
			}

			if ($this->isKeyValuePair($line))
			{
				$this->assimiateKeyValuePair($line);
				continue;
			}
		}
	}


	protected function assimilateChildNode($line)
	{
		$name = substr(trim($line), 0, -1);
		$level = $this->getLevel($line);
		$this->nodes[$level + 1] = $this->nodes[$level]->addChildNode($name);
	}


	protected function assimiateKeyValuePair($line)
	{
		preg_match('/^([a-z]{1}[^\s]+):(.+)$/', trim($line), $matches);
		$key = $matches[1];
		$value = $this->autoType(trim($matches[2]));

		$level = $this->getLevel($line);
		$this->nodes[$level]->addKeyValuePair($key, $value);
	}



	protected function getLevel($line)
	{
		if (preg_match('/^\s+/', $line, $matches))
		{
			return substr_count($matches[0], '  ');
		}

		return 0;
	}


	/**
	 * must start with a-z and end with ":" (no spaces)
	 */
	protected function isChildNode($line)
	{
		$line = trim($line);
		return (preg_match('/^[a-z]{1}[^\s]+:$/', $line) === 1);
	}


	protected function isKeyValuePair($line)
	{
		$line = trim($line);
		return (preg_match('/^[a-z]{1}[^\s]+:.+$/', $line) === 1);
	}


	protected function isComment($line)
	{
		return (strpos(trim($line), '#') === 0);
	}


	protected function getDataLines($yaml)
	{
		if (strpos($yaml, "\n") === false)
		{
			return file($yaml);
		}

		return explode("\n", $yaml);
	}


	protected function autoType($value) {
		$lower_value = strtolower($value);

		switch (true) {

			// $value is an integer
			case ($value === (string)(int)$value):
				return (int)$value;
				break;

			// $value is a float
			case ($value === (string)(float)$value):
				return (float)$value;
				break;

			// $value is a boolean true equivalent
			case ($lower_value === 'true'):
			case ($lower_value === 'on'):
			case ($lower_value === 'yes'):
			case ($lower_value === 'y'):
			case ($lower_value === '+'):
				return true;
				break;

			// $value is a boolean false equivalent
			case ($lower_value === 'false'):
			case ($lower_value === 'off'):
			case ($lower_value === 'no'):
			case ($lower_value === 'n'):
			case ($lower_value === '-'):
				return false;
				break;

			case ($lower_value === 'null'):
			case ($lower_value === '~'):
			case ($lower_value === ''):
				return null;
				break;

			// $value is just a string
			default:
				return $this->stripQuotes($value);
				break;
		}
	}


	protected function stripQuotes($string) {
		return preg_replace('/(^"(.+)"$)|(^\'(.+)\'$)/', '$2$4', $string);
	}

}

sample.yml (example app configuration):

Code: Select all

domain.com:
  mode: production
  http:
    url: http://www.domain.com/index.php
    cookie: domain.com
  https:
    url: https://www.domain.com/index.php
    cookie: domain.com
  databases:
    transactional:
      read: type://user:pass@host.tld:port/dbname
      write: type://user:pass@socket:/tmp/mysql.sock/dbname
    analytical:
      read: type://user:pass@host.tld:port/dbname
      write: type://user:pass@socket:/tmp/mysql.sock/dbname
sample use:

Code: Select all

$conf = new YamlNode('sample.yml');

// access as an array
var_dump($conf['domain.com']['databases']['transactional']['read']);

// access as object properties
var_dump($conf->{'domain.com'}->databases->transactional->read);

// access mixed
var_dump($conf['domain.com']->databases->transactional->read);
It has been working well for me, and like I said it only implements a small subset of YAML since I don't require all of YAML. Any the key/value pairs key names are a bit more strict since I don't allow spaces. But anyone is free to use it or expand on it and add more YAML Specifications. :)

If you're wondering what the YamlNode::assimilateData() is all about, it is so that I can merge several discrete YAML files into a single YamlNode. Quite handy for modular systems ;-)

Posted: Sat Mar 24, 2007 1:00 pm
by Christopher
Maugrim_The_Reaper wrote:The Zend Framework is under fire for its solution - the problem is that it's not a great solution, and they need to implement one in the near future. I get the distinct feeling myself and several others are less than impressed with a 1.0 release which has such poor input filter/validation control. The problem is that you *MUST* use $_POST in the Zend Framework. I was originally under the impression you could use Zend_Controller_Request_Http but this has been shot down as an unintentional user interface (it's only intended for use internally in the framework). This means the ZF effectively has no reliable userland Request object which is news for me and pretty damn worrying since it means any future userland Request object would require some obvious refactoring work at a time when the API must be stable.
It is very disheartening to see this all play out. I have been one of the people who pushed them from the beginning to implement Request/Response objects, Filter and Validator chains, standardized containers, etc. But to watching them implement half-baked solutions because they apparently really don't understand the design issues has taken a toll on my interest in ZF. What is worse, the start tacking on fixes to solve symptoms of the design's problems because they can't see the design problems (yet). I have stepped away from it because it seem like it will need to evolve the code as they learn the design issues the hard way.
Maugrim_The_Reaper wrote:Cart opened a thread recently on the same subject which would be worth reading, not sure if he has public code or not but it has a few pointers for an OO validator chain. The gist of the thread was to have an interface which lends itself to configuration - i.e. you don't have to lump in lines of validation code for every single controller action (probably the most boring and essential part of controller writing) if all it takes is a configuration file, a Factory, and some automagic.
I don't mind allowing Filter/Validator chains or Form Controllers to be initialized through configuration data, but I am a little leery of "automagic" because I believe it obscures the tedious, but very simple and self documenting initialization of these chains and controllers. I have found that explicit is better in the long run -- having returned to both kinds of code to make changes after months or years have passed.
Maugrim_The_Reaper wrote:An OO solution would be as simple - you'd just be replacing the procedural regex passes with a filter chain composed of individual rules, one of which can be a generic Regex rule accepting any Regex string you want. One of the powers of config based chaining is the config file would hold the Regex to be used at runtime.
Perhaps we should start a thread to hash through the issues and come up with a proposal for the ZF?

Posted: Sat Mar 24, 2007 1:08 pm
by Maugrim_The_Reaper
It could be a good idea - once Bill Karwin gets around to inviting proposals (seems to be something in mind off the mailing list going by references) it could at least compete and help put a final design in perspective.