Page 1 of 1

Where to validate data

Posted: Tue Feb 26, 2008 9:12 pm
by Ambush Commander
I know this is a time-worn question, but I'd appreciate it if you guys looked at my specific situation and gave some suggestions.

I am writing a configuration schema system, in which developers write small txt files which define what a configuration option is, what type it is, and a variety of useful runtime info and documentation.

My system, then, parses the txt file into a "universal" interchange schema object, which then can be translated into a runtime ConfigSchema class, which the configuration object uses to validate configuration directives, or DOM/XML/HTML, for the purpose of generating documentation.

There are some constraints on what the values of the txt file can be. For example, configuration directives names must be alphanumeric, there are only a specific set of allowed types, the default value must be parseable PHP code, etc. Thus, some sort of validation is necessary.

I'm trying to figure out how I should implement validation. One method that popped to mind was this:

Code: Select all

 
// using procedural for clarity; actual system is OO
function parseSchemaFile(&$schema, $file) {
  // parsing code
  $contents = file($file);
  $values = array();
  foreach ($contents as $keypair) {
    list($k, $v) = explode(': ', $keypair, 2);
    $values[$k] = $v;
  }
  
  // now, performing some evaluation
  if (isset($values['php'])) {
    $r = eval('$values["php"] = ' . $values['php'] . ';return true');
    if (!$r) throw new Exception('Evaluation of php failed');
  }
  
  // now, performing some validation
  if (!isset($values['id'])) {
    throw new Exception('Id must exist');
  }
  if (!ctype_alnum($values['id'])) {
    throw new Exception('Id must be alphanumeric');
  }
  if ($schema->idExists($values['id'])) {
    throw new Exception('Id must be unique');
  }
  
  // now, put it into an object
  $item = new SchemaItem($values['id']);
  if (isset($values['php'])) $item->setPHP($values['php']);
  
  // now, put it into the full schema
  $schema->add($item);
}
 
But this jumbles a lot of unrelated functionality into a single function, which is what I'd like to avoid. My question is, how do I refactor this into an extensible, OO version? Thanks!

Re: Where to validate data

Posted: Tue Feb 26, 2008 11:46 pm
by Christopher
I see parse (that's a filter), evaluate (that's validation), validation, setter, use object. That sounds like a ... a ... program ...

Re: Where to validate data

Posted: Wed Feb 27, 2008 6:22 am
by Ambush Commander
Right. So I'm interested in:

1. Who's responsibility is it to call the validator
2. How should I structure the validator so I don't have a single function doing all the validation I need (because that's what I currently have)

Re: Where to validate data

Posted: Wed Feb 27, 2008 1:54 pm
by alex.barylski
I'm not really understanding what it is your asking, but a shot in the dark...I would personally keep each logical section in a simple function.

Parsing (aka: syntactic analysis) is the process of analyzing tokens and generating a parse tree from some grammar - similar to schema.

Validation is an implicit process, which occurs during the generation of the parse tree. The tokens are iterated and based on the grammar the process determines whether the next token is allowed to be there, given the current context.

Your parsing code, is actually more akin to tokenizing, as there is no real structure to the results and you are not validating via a grammar, just spliting an arbitrary string into logical chunks.

I would do something like:

Code: Select all

 
function processSchemaFile($file) 
{
  $tokens = _tokenizeLines(file($file));
 
  _evalTokens($tokens);
  _validateTokens($tokens);
 
  /*
    TODO: Initialize $schema object and return it, instead of using outgoing parameter? 
    Unless you need to create it externally and pre-initialize it with something. In which case maybe
    inject it into an object at construction? To avoid confusing the API - unless of course it makes no sense to you. 
  */
 
  return $schema;
}
 
function _tokenizeLines($lines)
{
  $tokens = array(); 
  foreach ($lines as $line) {
    list($k, $v) = explode(': ', $line, 2);
    $tokens[$k] = $v;
  }
 
  return $tokens;
}
 
function _evalTokens($tokens)
{
  if (isset($tokens['php'])) {
    $r = eval('$tokens["php"] = ' . $tokens['php'] . ';return true');
    if (!$r) throw new Exception('Evaluation of php failed');
  }
}
 
function _validateTokens($tokens)
{
  if (!isset($tokens['id'])) {
    throw new Exception('Id must exist');
  }
  if (!ctype_alnum($tokens['id'])) {
    throw new Exception('Id must be alphanumeric');
  }
  if ($schema->idExists($tokens['id'])) {
    throw new Exception('Id must be unique');
  }
}
I exluded the last part - your schema object initialization because I figured I had done enough to make my point. :)

I don't think I'd use the word parsing in any of those functions. You are using PHP's internal engine to evaluate some source code as well - I think I would keep that logic tied in with validation but that is based on personal opinion and I obviously *don't* have as clear of picture as you do about where you are going with all this.

HTH

Cheers :)

Re: Where to validate data

Posted: Wed Feb 27, 2008 1:58 pm
by Christopher
Ambush Commander wrote:1. Who's responsibility is it to call the validator
Not sure this is what you are looking for, but the first choice is to do the validataion in the Domain Model, second would be in the Controller/Presentation.
Ambush Commander wrote:2. How should I structure the validator so I don't have a single function doing all the validation I need (because that's what I currently have)
As a chain that you can add rule objects to and then run later when needed.

Re: Where to validate data

Posted: Wed Feb 27, 2008 9:05 pm
by Ambush Commander
Validation is an implicit process, which occurs during the generation of the parse tree. The tokens are iterated and based on the grammar the process determines whether the next token is allowed to be there, given the current context.
Weell, I've always thought, since this is a simple keypair system this would be handled similarly to the validation of GET and POST parameters.
Not sure this is what you are looking for, but the first choice is to do the validataion in the Domain Model, second would be in the Controller/Presentation.
Heh, you've basically told me every single component of a regular application (MVC). So you favor the model doing validation, and the controller staying out of it, hmm. That's helpful, but only slightly, because I don't have a presentation layer in this system, and the controller is the userland code that is invoking this parsing process, I suppose.

So I wonder what the domain model should look like to handle validation.
As a chain that you can add rule objects to and then run later when needed.
This is probably when I should describe my current OO setup:

Parser - Parses a txt file in form of "KEY: value\nKEY2: value2" into an associative array, no other processing. Note that this parses only ONE directive at a time, but many directives make up a schema.

Code: Select all

interface Parser
{
  /** @return Array of parsed contents */
  public function parseFile($file);
}
Schema - Interchange format which represents a schema, contains directives

Code: Select all

interface Schema
{
  public function idExists($id);
  public function add($directive);
  public function getIterator(); // to iterate over directives
}
Directive - Describes a configuration directive as part of a schema

Code: Select all

interface Directive
{
  public function getId();
  public function getType(); // required by runtime
  public function getDescription(); // optional, used for docs
}
Importer - Takes the associative array, and does further parsing, generates a directive object, and adds it to our Schema. This includes evaluating PHP code or exploding a comma separated list of values.

Code: Select all

interface Importer
{
  /** @return Schema interchange object, basically a glorified assoc. array */
  public function import($schema, $array);
}
Exporter- Takes our schema interchange object, and serializes/exports the values into an array of strings, which can then be stored back in the txt file (basically, the reverse of Importer)

Code: Select all

interface Exporter
{
  /** @return Array of arrays of contents ready to go in text */
  public function export($schema);
}
RuntimeConvertor - Takes our schema interchange object, and generates a runtime schema object from it

Code: Select all

interface RuntimeConvertor
{
  /** @return Runtime schema object (not described) */
  public function convert($schema); 
}
DocConvertor - Takes our schema interchange object, and generates documentation from it

Code: Select all

interface DocConvertor
{
  public function generate($schema, $output_dir);
}
In the process of writing this, I realized that quite a bit of this code is simply glorified procedural code, so suggestions about that would be nice. But there is also a question of which of these classes should I put the validation code in, or make a validation class and make the userland call invoke it. It doesn't do much good to make a filter chain if I don't know where to call the chain from.