Page 1 of 1

Parsing homemade query languages

Posted: Mon Nov 30, 2009 8:23 pm
by Randwulf
Hi,

I need to be able to parse strings that look sort of like this (the user builds a similar string in order to specify how the DB should be searched):

((((a AND b) OR (c OR (d AND e))) AND f) OR g)

a, b, c, etc are strings.

So I need to have a PHP script that can understand what the users wants when they enter such a string. Here's a simpler example:

((a AND b) OR c)

That means "get me all DB entries that contain string c or contain both string a and b".

Are there any scripts that can do something like this? Does anyone have any tips for doing this? Is it even realistically possible with PHP?

Thanks.

Re: Parsing homemade query languages

Posted: Mon Nov 30, 2009 9:00 pm
by AlanG
You'll need to use regular expressions. I'm not too good with them so can't really help you.

Code: Select all

((a AND b) OR c)
 
That means "get me all DB entries that contain string c or contain both string a and b".
Just wanted to comment on the logic of this. I assume you have your own reasons for this (educational etc...) but your query has very limited capabilities. What about a 'NOT' condition?... Are the conditions encased in particular characters (e.g. "a" AND "b"), also is the query language case sensitive? All important enough questions. I'm not sure what context you want to use this in, but you should also consider that as PHP needs to be interpreted itself, it might be too slow for what you want. You should look at something like C++ or Java in that case.

I've never actually done it myself but taking a look at the MySQL database source code could be worth something. It obviously parses SQL and finding that code could be worth the hassle. :)

Re: Parsing homemade query languages

Posted: Mon Nov 30, 2009 9:17 pm
by Randwulf
AlanG wrote:You'll need to use regular expressions. I'm not too good with them so can't really help you.

Code: Select all

((a AND b) OR c)
 
That means "get me all DB entries that contain string c or contain both string a and b".
Just wanted to comment on the logic of this. I assume you have your own reasons for this (educational etc...) but your query has very limited capabilities. What about a 'NOT' condition?... Are the conditions encased in particular characters (e.g. "a" AND "b"), also is the query language case sensitive? All important enough questions. I'm not sure what context you want to use this in, but you should also consider that as PHP needs to be interpreted itself, it might be too slow for what you want. You should look at something like C++ or Java in that case.

I've never actually done it myself but taking a look at the MySQL database source code could be worth something. It obviously parses SQL and finding that code could be worth the hassle. :)
Thanks for your response. And input is always welcome :D

It does have a NOT condition.

As for it's encapsulation, the terms are actually specified on a different page. For example you might enter a string (it's a little bit more complicated then a single string actually but a string suffices for the purpose of example) as "a" on the other page, another as "b", and etc, and then on the final page you combine your search parameters and specify "AND", "NOT", etc. This way, no encapsulation is required on this final page.

And yeah, you're right that it could be slow >.< I'm still open for ideas on how to do it, I might settle for something other than having the user enter a string. Perhaps they'd fill in a billion drop down boxes instead.

I'm not experienced enough to play with MySQL source code :p nor do I know C++. I might somehow be able to break the string down into something that can be managed by MySQL's boolean search feature.

Once again, thanks for your quick reply.

Re: Parsing homemade query languages

Posted: Mon Nov 30, 2009 9:33 pm
by AlanG
I had a quick look at the MySQL source... yeah... above my head too. C++ is definitely not my language of choice lol

Do you really have a need for a custom query language? What is your data store? If it's a database, and you need to use a custom query language for some obscure reason (e.g. the client in all his wisdom wants one), then you could translate them to SQL and execute them then.

I don't know the details of the system, but Java might be a good solution. Or Python compiled to byte code. Either would definitely be faster. There's a few options anyway.