Page 1 of 1

SQL regex

Posted: Tue Jun 30, 2009 9:33 pm
by SidewinderX
I'm trying to parse a basic SQL statement.

ie:
SELECT a FROM b
SELECT a,aa,aaa FROM b
SELECT a FROM b WHERE c<>d
SELECT a,aa,aaa FROM b WHERE c==d

Here is what I came up with. (I think) It does the job, but it sure does look ugly.

Code: Select all

$pattern = '#^SELECT ([a-z]+(?:,[a-z]+)*)' //match SELECT a or SELECT a,b but not SELECT a,,,
          .' FROM ([a-z0-9]+)'
          .'(?:\sWHERE\s([a-z0-9_]+)\s?(==?|!=|<>|<=?|>=?)\s?([a-z0-9_]+))?$#is';
 
Any tips? Would it be smarter to just do something like

Code: Select all

$pattern = '#^SELECT (.+) FROM (.+?)(?: WHERE (.+))?$#is';
and then validate each part individually?

Re: SQL regex

Posted: Wed Jul 01, 2009 1:21 am
by prometheuzz
You could use the x-flag that will let you break your regex into multiple lines and you could also define a sub-pattern that will define identifiers and replace that in your main regex pattern. A bit like this:

Code: Select all

$identifier = '[a-z][a-z0-9_]*';
$pattern = "#
  ^SELECT\s+($identifier(?:\s*,\s*$identifier)*)\s+
  FROM\s+($identifier)
  (?:
    \s+WHERE\s+($identifier)\s*
    (==?|!=|<>|<=?|>=?)\s*($identifier)
  )?$
#ix";
 
$tests = array(
  'SELECT a FROM b',
  'SELECT a,aa,aaa FROM b',
  'SELECT a FROM b WHERE c<>d',
  'SELECT a,aa,aaa FROM b WHERE c==d'
);
 
foreach($tests as $t) {
  if(preg_match($pattern, $t, $match)) {
    print_r($match);
  }
}
Of course, it is a tricky thing: matching SQL statements with regex. A more robust solution would be to use a dedicated SQL parser.

Good luck!