Page 1 of 1

Best way to parse a "query" string...

Posted: Mon Nov 07, 2005 8:50 am
by aluminumpork
I'm working on a custom query system that needs to be very dynamic.

Right you can create a type of query by adding the querytype array.
So...

$querytype[] = array('select','%COLUMN%','from','%TABLE%','!%WHERE%!','!%COLUMN%!','!%OPERATOR%!','!%VALUE%!');

A % symbol means that it's a special value and must be compared to a list of knowns. A ! symbol means it's optional and is not required.
Am I going about this the right way?

Also, I'm looking for advice on the best way to parse the query string it receives, right now I'm pulling to query into an array by exploding it with spaces.

Any ideas?

Re: Best way to parse a "query" string...

Posted: Mon Nov 07, 2005 9:23 am
by Chris Corbyn
aluminumpork wrote:I'm working on a custom query system that needs to be very dynamic.

Right you can create a type of query by adding the querytype array.
So...

$querytype[] = array('select','%COLUMN%','from','%TABLE%','!%WHERE%!','!%COLUMN%!','!%OPERATOR%!','!%VALUE%!');

A % symbol means that it's a special value and must be compared to a list of knowns. A ! symbol means it's optional and is not required.
Am I going about this the right way?

Also, I'm looking for advice on the best way to parse the query string it receives, right now I'm pulling to query into an array by exploding it with spaces.

Any ideas?
I guess the syntax you choose is up to you (did you say you're writing a custom language?).

To parse the string... don't explode at whitespace, it'll break strings etc which are quoted.

Tokenize it using a regular expression -- be warned... these kinds of regex can get very very scary :)

EDIT | This one does a fairly basic job of tokenizing the syntax apart for you, as well as leaving strings and octal/hexadecimal numbers in tact.

Code: Select all

$re = "#(?:(?<!\\\\)\'.*?(?<!\\\\)\')|(?:(?<!\\\\)".*?(?<!\\\\)")|(?:(?<!\\\\)//.*?\n)|(?:(?<!\\\\)/\\*.*?\\*/)|0x[a-z0-9]+|\\s+|\\W|\\w+#ism";
feyd posted something similar but more powerful a while back too - in regex or snippets I think.