Page 2 of 2

Posted: Mon Jul 04, 2005 4:55 am
by Syranide
Btw, perhaps "code" would be a better (not perfect but...) name for "query", as basically it is a language, and you don't query information, but compile.

Should be a more appropriate word for it though :S

Posted: Tue Jul 05, 2005 6:45 am
by jmut
Hi all,
I saw this forum and topic and signed up right away because right now I am working on a php security related project and ultimately the best way to solve problems are tokenizing the php code. I won't describe my project in detail but here is a link if interested:
http://securityscanner.sourceforge.net/
Till now I'm using reg expressions but would like to go deeper with tokens.

To the point:
One thing I don't get here. What are you trying to achive with defining your own token types? What would be your final goal. Looking for syntax errors etc?
I was just wondering why don't you use the php predefined tokens.

http://www.php.net/manual/en/tokens.php

Other than that, great work.
Respect

Posted: Tue Jul 05, 2005 6:55 am
by Syranide
jmut wrote:Hi all,
I saw this forum and topic and signed up right away because right now I am working on a php security related project and ultimately the best way to solve problems are tokenizing the php code. I won't describe my project in detail but here is a link if interested:
http://securityscanner.sourceforge.net/
Till now I'm using reg expressions but would like to go deeper with tokens.

To the point:
One thing I don't get here. What are you trying to achive with defining your own token types? What would be your final goal. Looking for syntax errors etc?
I was just wondering why don't you use the php predefined tokens.

http://www.php.net/manual/en/tokens.php

Other than that, great work.
Respect
I really don't understand what you are saying fits with this thread?

Why would one want to use PHPs tokens? I really don't see how that would improve the situation at all... please explain.
And he is not tokenizing PHP-code, he is tokenizing his "own" language.

Posted: Tue Jul 05, 2005 7:16 am
by timvw
A php source parser/tokenizer is more or less available at http://devel.akbkhome.com/svn/index.php ... HP_Parser/ ;)

Posted: Tue Jul 05, 2005 7:24 am
by timvw
Odd, i don't know why i forgot about that section in the manual....
After having it lookup again now, i even remember that last time i saw that section in thought: ooooooooooooooh, nice ;)

Posted: Tue Jul 05, 2005 7:45 am
by jmut
Syranide wrote:
jmut wrote:Hi all,
I saw this forum and topic and signed up right away because right now I am working on a php security related project and ultimately the best way to solve problems are tokenizing the php code. I won't describe my project in detail but here is a link if interested:
http://securityscanner.sourceforge.net/
Till now I'm using reg expressions but would like to go deeper with tokens.

To the point:
One thing I don't get here. What are you trying to achive with defining your own token types? What would be your final goal. Looking for syntax errors etc?
I was just wondering why don't you use the php predefined tokens.

http://www.php.net/manual/en/tokens.php

Other than that, great work.
Respect
I really don't understand what you are saying fits with this thread?

Why would one want to use PHPs tokens? I really don't see how that would improve the situation at all... please explain.
And he is not tokenizing PHP-code, he is tokenizing his "own" language.
I am not saying it will improve anything...
I said:
One thing I don't get here. What are you trying to achive with defining your own token types? What would be your final goal. Looking for syntax errors etc?
Hence, I don't understand the idea. 10x for "clearing out" this to me.

Posted: Tue Jul 05, 2005 10:51 am
by Chris Corbyn
Yes I'm actually defining my own token types (have defined pretty much although I keep finding reasons to define new ones too).

To be honest this is completely new ground to me so I wouldn't be surprised if I'm doing things wrong.

There's a good reason I'm NOT using the PHP ones.... this is my own "SmartExp" language and the tokens are for different things (regex based).

Why do I define them? Well, after Syranide kindly inclined me to do so there's big benefits.

1. Syntax error checking - with a good set of tokens and a well thought out syntax it's easy (er) to check for syntax errors before going ahead and processing all the data. I'm pretty much gonna be relying on good syntax checking to and not do any checking at all when processing it all (i.e. if no errors are found, it should process smoothly anyway).
2. Oragnisation - It's so much easier to work with when everything is broken down into smaller parts and you know what each part is there for.
3. Logical workflow - I can work through the sequence of token I have generated and just about (ok it's not quite this ideal ;)) know what to do at each point in the sequence simply based on the token type.

My token types are now defined as constants rather than strings however.

Do you know reasons why this is poor practise or inefficient? I'd like to hear your views if you know other ways of handling this.

Cheers,

d11

Posted: Wed Jul 06, 2005 2:38 am
by jmut
d11wtq wrote:Yes I'm actually defining my own token types (have defined pretty much although I keep finding reasons to define new ones too).

To be honest this is completely new ground to me so I wouldn't be surprised if I'm doing things wrong.

There's a good reason I'm NOT using the PHP ones.... this is my own "SmartExp" language and the tokens are for different things (regex based).

Why do I define them? Well, after Syranide kindly inclined me to do so there's big benefits.

1. Syntax error checking - with a good set of tokens and a well thought out syntax it's easy (er) to check for syntax errors before going ahead and processing all the data. I'm pretty much gonna be relying on good syntax checking to and not do any checking at all when processing it all (i.e. if no errors are found, it should process smoothly anyway).
2. Oragnisation - It's so much easier to work with when everything is broken down into smaller parts and you know what each part is there for.
3. Logical workflow - I can work through the sequence of token I have generated and just about (ok it's not quite this ideal ;)) know what to do at each point in the sequence simply based on the token type.

My token types are now defined as constants rather than strings however.

Do you know reasons why this is poor practise or inefficient? I'd like to hear your views if you know other ways of handling this.

Cheers,

d11
Apperantly I didn't get the idea at the begining...
Think what you're doing is great...I will have a closer look in the code so far and come with suggestions if any.