Btw, perhaps "code" would be a better (not perfect but...) name for "query", as basically it is a language, and you don't query information, but compile.
Should be a more appropriate word for it though :S
Parsers written in PHP (Regex builder)
Moderator: General Moderators
Hi all,
I saw this forum and topic and signed up right away because right now I am working on a php security related project and ultimately the best way to solve problems are tokenizing the php code. I won't describe my project in detail but here is a link if interested:
http://securityscanner.sourceforge.net/
Till now I'm using reg expressions but would like to go deeper with tokens.
To the point:
One thing I don't get here. What are you trying to achive with defining your own token types? What would be your final goal. Looking for syntax errors etc?
I was just wondering why don't you use the php predefined tokens.
http://www.php.net/manual/en/tokens.php
Other than that, great work.
Respect
I saw this forum and topic and signed up right away because right now I am working on a php security related project and ultimately the best way to solve problems are tokenizing the php code. I won't describe my project in detail but here is a link if interested:
http://securityscanner.sourceforge.net/
Till now I'm using reg expressions but would like to go deeper with tokens.
To the point:
One thing I don't get here. What are you trying to achive with defining your own token types? What would be your final goal. Looking for syntax errors etc?
I was just wondering why don't you use the php predefined tokens.
http://www.php.net/manual/en/tokens.php
Other than that, great work.
Respect
I really don't understand what you are saying fits with this thread?jmut wrote:Hi all,
I saw this forum and topic and signed up right away because right now I am working on a php security related project and ultimately the best way to solve problems are tokenizing the php code. I won't describe my project in detail but here is a link if interested:
http://securityscanner.sourceforge.net/
Till now I'm using reg expressions but would like to go deeper with tokens.
To the point:
One thing I don't get here. What are you trying to achive with defining your own token types? What would be your final goal. Looking for syntax errors etc?
I was just wondering why don't you use the php predefined tokens.
http://www.php.net/manual/en/tokens.php
Other than that, great work.
Respect
Why would one want to use PHPs tokens? I really don't see how that would improve the situation at all... please explain.
And he is not tokenizing PHP-code, he is tokenizing his "own" language.
A php source parser/tokenizer is more or less available at http://devel.akbkhome.com/svn/index.php ... HP_Parser/ 
I am not saying it will improve anything...Syranide wrote:I really don't understand what you are saying fits with this thread?jmut wrote:Hi all,
I saw this forum and topic and signed up right away because right now I am working on a php security related project and ultimately the best way to solve problems are tokenizing the php code. I won't describe my project in detail but here is a link if interested:
http://securityscanner.sourceforge.net/
Till now I'm using reg expressions but would like to go deeper with tokens.
To the point:
One thing I don't get here. What are you trying to achive with defining your own token types? What would be your final goal. Looking for syntax errors etc?
I was just wondering why don't you use the php predefined tokens.
http://www.php.net/manual/en/tokens.php
Other than that, great work.
Respect
Why would one want to use PHPs tokens? I really don't see how that would improve the situation at all... please explain.
And he is not tokenizing PHP-code, he is tokenizing his "own" language.
I said:
Hence, I don't understand the idea. 10x for "clearing out" this to me.One thing I don't get here. What are you trying to achive with defining your own token types? What would be your final goal. Looking for syntax errors etc?
- Chris Corbyn
- Breakbeat Nuttzer
- Posts: 13098
- Joined: Wed Mar 24, 2004 7:57 am
- Location: Melbourne, Australia
Yes I'm actually defining my own token types (have defined pretty much although I keep finding reasons to define new ones too).
To be honest this is completely new ground to me so I wouldn't be surprised if I'm doing things wrong.
There's a good reason I'm NOT using the PHP ones.... this is my own "SmartExp" language and the tokens are for different things (regex based).
Why do I define them? Well, after Syranide kindly inclined me to do so there's big benefits.
1. Syntax error checking - with a good set of tokens and a well thought out syntax it's easy (er) to check for syntax errors before going ahead and processing all the data. I'm pretty much gonna be relying on good syntax checking to and not do any checking at all when processing it all (i.e. if no errors are found, it should process smoothly anyway).
2. Oragnisation - It's so much easier to work with when everything is broken down into smaller parts and you know what each part is there for.
3. Logical workflow - I can work through the sequence of token I have generated and just about (ok it's not quite this ideal
) know what to do at each point in the sequence simply based on the token type.
My token types are now defined as constants rather than strings however.
Do you know reasons why this is poor practise or inefficient? I'd like to hear your views if you know other ways of handling this.
Cheers,
d11
To be honest this is completely new ground to me so I wouldn't be surprised if I'm doing things wrong.
There's a good reason I'm NOT using the PHP ones.... this is my own "SmartExp" language and the tokens are for different things (regex based).
Why do I define them? Well, after Syranide kindly inclined me to do so there's big benefits.
1. Syntax error checking - with a good set of tokens and a well thought out syntax it's easy (er) to check for syntax errors before going ahead and processing all the data. I'm pretty much gonna be relying on good syntax checking to and not do any checking at all when processing it all (i.e. if no errors are found, it should process smoothly anyway).
2. Oragnisation - It's so much easier to work with when everything is broken down into smaller parts and you know what each part is there for.
3. Logical workflow - I can work through the sequence of token I have generated and just about (ok it's not quite this ideal
My token types are now defined as constants rather than strings however.
Do you know reasons why this is poor practise or inefficient? I'd like to hear your views if you know other ways of handling this.
Cheers,
d11
Apperantly I didn't get the idea at the begining...d11wtq wrote:Yes I'm actually defining my own token types (have defined pretty much although I keep finding reasons to define new ones too).
To be honest this is completely new ground to me so I wouldn't be surprised if I'm doing things wrong.
There's a good reason I'm NOT using the PHP ones.... this is my own "SmartExp" language and the tokens are for different things (regex based).
Why do I define them? Well, after Syranide kindly inclined me to do so there's big benefits.
1. Syntax error checking - with a good set of tokens and a well thought out syntax it's easy (er) to check for syntax errors before going ahead and processing all the data. I'm pretty much gonna be relying on good syntax checking to and not do any checking at all when processing it all (i.e. if no errors are found, it should process smoothly anyway).
2. Oragnisation - It's so much easier to work with when everything is broken down into smaller parts and you know what each part is there for.
3. Logical workflow - I can work through the sequence of token I have generated and just about (ok it's not quite this ideal) know what to do at each point in the sequence simply based on the token type.
My token types are now defined as constants rather than strings however.
Do you know reasons why this is poor practise or inefficient? I'd like to hear your views if you know other ways of handling this.
Cheers,
d11
Think what you're doing is great...I will have a closer look in the code so far and come with suggestions if any.