Page 1 of 1

Collapsing class hierarchies

Posted: Tue Oct 30, 2007 5:50 pm
by alex.barylski
I've started a new pet project...the idea is an advanced obfuscator for PHP sources. Last night I hammered out a whitespace/comments remover and a local variable obfuscator (no member data, just functions arguments and variables in funciton scope).

I've spent some time today thinking about what else could be done to further obfuscate source code and many ideas I've hacked out certainly show some promise for some advanced refactoring. I"m getting pretty excited about this little pet project.

Anyways, one way to certainly obfuscate source code is to collapse class hierarchies.

I plan on implementing this using an algorithm like:

1) Find a class
2) Determine if it extends any base class
3) Recursively travel up class hieracrhy until base class found
4) Scan source tree for any other classes which extend base class
5) Store each of those classes in a stack
6) Scan base class source for member variables
6.1) Refactor member variables into each of the superclasses
7) Scan base class source for member functions
7.1) Check if superclass re-defines any method
7.2) Scan re-defined method in superclass for a call to base implementation
7.4) Refactor base method call w/ inlined code OR add base method with renamed function and call that instead???
7.3) Refactor member function into each of the superclasses

Already just writing this out helped me recognize something I missed with step 7.4 - I think renaming the base method would make more sense and just cimply change the call from one to the other. As a last phase of my obfuscation CLI tool I plan on expanding private methods and static functions, but only under a certain set of rules, like if the method is invoked less than three times per method - expand/inline the code.

So anyways, as you can see the details get gory pretty quick and I could certainly use some more eyes/heads going over the details with me to help me spot potential problems, etc...

One thing I would like to request right now is, we stick to the topic at hand, that is the collapsing class hierarchies and refrain from offering other ideas on auto-refactoring techniques, such as method extraction, etc - yes I get as excited as you do about these topics but keeping things on topic will help me accelerate thedevelopment. :)

Cheers :)

Posted: Tue Oct 30, 2007 6:08 pm
by Jenk
I've got to say it.. I wouldn't touch any form of obfuscator with a 10ft barge pole. Anything that can be obfuscated can be unobfuscated with surprising ease.

A code beautifier, like myself and my colleagues use in our office, would effortlessly undo the stripping of whitespace. Obfuscated Variable names would not be of issue - you can still match them up without problem, PITA, but nothing near secure.

If you want source code protection, implement a sufficient license. If that's not enough for you, use one of the encrypters. I really do not see any place on the "market" for this, sorry.

Posted: Tue Oct 30, 2007 7:11 pm
by alex.barylski
I figured somebody would chime in about beautifiers... :P

Here's the thing, that is only the tip of the iceberg...really it is...

Source code at that level is not really important to preotect as any ninkinpoop can usually get the gist of how a function works by just playing with the interface. It's the ideas that count not the source code. So in saying that, I am not really trying to protect source code at that level it's just something to start with to begin getting a feel for what is involved in code obfuscation.

Architecture...is adifferent matter. To do it right takes time, experience, wisdom, etc. How I organize my files, classes, etc...for me is the biggest factor in my succeeding where my competitors don't. I've seen many of my competitors source code, in fact I even worked with/for a few. They almost all lack any architecture. I am certain, once I release my competitors will be investigating my application innerds (some did it to others, so why would I be different?).

The point is, by obfuscating over-all design I am certain (even the most seasoned developer) will have a far greater time figuring out how I produce superior applications. By collpasing hierarchies, and expanding the several layers which compose my applications they will see only the final monolithic result.

Unlike obfuscating source code variables, which beautifiers can quickly decode (that was more for performance reasons anyways) architecture is almost impossible decode from a monolithic source...

Cheers :)

Posted: Tue Oct 30, 2007 10:20 pm
by Selkirk
I've run into a few programmers that can obfuscate your code by hand. :roll:

Posted: Tue Oct 30, 2007 11:48 pm
by Kieran Huggins
Check out: http://www.phpcompiler.org

It doesn't quite work yet, but the SVN is open and might be a virtual treasure trove for you.

On a semi-related note: http://www.bambalam.se/bamcompile/ seems like it's worth looking at.

Posted: Wed Oct 31, 2007 12:26 am
by alex.barylski
Selkirk wrote:I've run into a few programmers that can obfuscate your code by hand.
I don't follow you. Obfuscating code by hand is simply not practical. I would need to carry out those tasks everytime I changed my source, which is currently at 2500 SLOC and by the time I finish I reckon more like 35K+

Not only that, but error prone...automation is the only way to go.
Kieran Huggins wrote:Check out: http://www.phpcompiler.org

It doesn't quite work yet, but the SVN is open and might be a virtual treasure trove for you.

On a semi-related note: http://www.bambalam.se/bamcompile/ seems like it's worth looking at.
I'm familiar with phpcompiler but would prefer to roadsend:

http://www.roadsend.com/home/index.php?SMC=1

It actually kinda works. Unfortunately, this isn't the direction I wish to head in...

Basically, the goal is to perform static analysis and convert a highly modular framework of code into a single monolithic applications, thus removing just the architecture from the equation. The code will still execute as it did before, no EXE's, no accelerator engines, etc...just plain PHP code but hopefully slightly faster as the assembly of the 50+ classes per request will no longer be required - much code will be inlined.

Tons of duplicate code in the resulting output but it's source is not meant to be modified.

Ideally I wanted others who had an interest in parsing to jump on here and start a disscussion in theory of parsing perhaps hilite some gotcha's, etc...

Looking into existing compilers is...a tremedous task and likely will leave me feeling discombobulated. Been there done that.:P

Cheers :)

Posted: Wed Oct 31, 2007 12:51 am
by alex.barylski
Selkirk wrote:I've run into a few programmers that can obfuscate your code by hand.
I don't follow you. Obfuscating code by hand is simply not practical. I would need to carry out those tasks everytime I changed my source, which is currently at 2500 SLOC and by the time I finish I reckon more like 35K+

Not only that, but error prone...automation is the only way to go.
Kieran Huggins wrote:Check out: http://www.phpcompiler.org

It doesn't quite work yet, but the SVN is open and might be a virtual treasure trove for you.

On a semi-related note: http://www.bambalam.se/bamcompile/ seems like it's worth looking at.
I'm familiar with phpcompiler but would prefer to roadsend:

http://www.roadsend.com/home/index.php?SMC=1

It actually kinda works. Unfortunately, this isn't the direction I wish to head in...

Basically, the goal is to perform static analysis and convert a highly modular framework of code into a single monolithic applications, thus removing just the architecture from the equation. The code will still execute as it did before, no EXE's, no accelerator engines, etc...just plain PHP code but hopefully slightly faster as the assembly of the 50+ classes per request will no longer be required - much code will be inlined.

Tons of duplicate code in the resulting output but it's source is not meant to be modified.

Ideally I wanted others who had an interest in parsing to jump on here and start a disscussion in theory of parsing perhaps hilite some gotcha's, etc...

Looking into existing compilers is...a tremedous task and likely will leave me feeling discombobulated. Been there done that.:P

Cheers :)

Posted: Wed Oct 31, 2007 1:00 am
by alex.barylski
Selkirk wrote:I've run into a few programmers that can obfuscate your code by hand.
I don't follow you. Obfuscating code by hand is simply not practical. I would need to carry out those tasks everytime I changed my source, which is currently at 2500 SLOC and by the time I finish I reckon more like 35K+

Not only that, but error prone...automation is the only way to go.
Kieran Huggins wrote:Check out: http://www.phpcompiler.org

It doesn't quite work yet, but the SVN is open and might be a virtual treasure trove for you.

On a semi-related note: http://www.bambalam.se/bamcompile/ seems like it's worth looking at.
I'm familiar with phpcompiler but would prefer to roadsend:

http://www.roadsend.com/home/index.php?SMC=1

It actually kinda works. Unfortunately, this isn't the direction I wish to head in...

Basically, the goal is to perform static analysis and convert a highly modular framework of code into a single monolithic applications, thus removing just the architecture from the equation. The code will still execute as it did before, no EXE's, no accelerator engines, etc...just plain PHP code but hopefully slightly faster as the assembly of the 50+ classes per request will no longer be required - much code will be inlined.

Tons of duplicate code in the resulting output but it's source is not meant to be modified.

Ideally I wanted others who had an interest in parsing to jump on here and start a disscussion in theory of parsing perhaps hilite some gotcha's, etc...

Looking into existing compilers is...a tremedous task and likely will leave me feeling discombobulated. Been there done that.:P

Cheers :)

Posted: Wed Oct 31, 2007 7:13 am
by stereofrog
Jenk wrote: Anything that can be obfuscated can be unobfuscated with surprising ease.
Yes, if obfuscation only changes specific tokens in the code leaving its structure intact. However, if we convert that structure to another, semantically equivalent one (i.e. "compile" it), this would not be possible to revert. Consider, for example:

Code: Select all

$foo = $bar->baz($quux);
converted to

Code: Select all

get('quux'); push(1); push('baz'); get('bar'); call(); pop('foo');

Re: Collapsing class hierarchies

Posted: Wed Oct 31, 2007 8:03 am
by ev0l
Firstly I would like to say. Please don't do this :-)

I don't see any need to scan the source your self when PHP's reflection classes are quicker and easier. Let PHP's highly optimized parser/runtime do the heavy lifting for you.

My first step would be to find all base classes and traverse down the hierarchy from there. You might want to look at a refelction class I did recently. WHReflectionClass . (I don't personally like the way PHP handles reflection I should be able to ask the class about its structure but I digress). You would obviously have to skip the "system" classes as the source code does not exist but that is easy enough, just call isUserDefined().

From there flattening out your hierarchy is simple. ReflectionClass::getMethods() returns all methods for that class including inherited methods. You can then use ReflectionMethod::getFileName/getStartLine/getEndLine to get the source code. ReflectionMethod should have getSource but it is omitted so extending ReflectionMethod and implementing getSource is advisable.

Finding members is just as easy.

PHP provides the tools to make these thing easy. Don't make it to hard on your self ( in fact don't do it at all :-) )

Re: Collapsing class hierarchies

Posted: Wed Oct 31, 2007 8:44 am
by alex.barylski
ev0l wrote:PHP provides the tools to make these thing easy. Don't make it to hard on your self ( in fact don't do it at all :-) )
Parsing is almost never easy, regardless how you go about it. Reflection, unfortunately it not sufficient. For instance, will it tell me what methods are invoked and more importantly where in the source module, so I can replace the call with something else?

I think you missed the point - as did most people who read my post. :D

The idea is to design some fancy re-factoring code, essentially that is all I am doing, but more appropriately coined: anti-factoring

Instead of finding duplicate code and refactoring into a single external source, I'm reversing that operation.

One of the most compelling reasons for me to investigate this idea is that if the code allows for anti-factoring then you can possibly perform re-factorings as well. :D

By it's nature, refactoring tools/techniques are incredibly complex to implement/develop and is why there is such a shortage of refactoring tools especially in PHP - I might as well use Search and Replace. So yea...reflection isn't going to work but that's OK because I wasn't looking to re-invent the wheel nor was I looking for a walk in the park. The more challenge the better. :)