Refactoring or rewrite -- where do you start?

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Refactoring or rewrite -- where do you start?

Post by alex.barylski »

I will soon make a proposal to my project managers, either a re-write or a more gradual refactoring.

The system is quite elaborate, including such systems as:

- Classifieds
- Business Directory
- Member Profiles (3 types)
- Community Forums

And on and on and on :)

The site is composed of about a million transaction scripts which use no functions, SQL is mangled with HTML and HTML with PHP and JS with PHP and PHP with CSS and English with HTML and other.

It's a mess and we constantly fix bugs that take hours longer than I know they need to.

I'm not sure which direction the project managers will go in but refactoring is a likely bet.

How do you best manage a refactoring like this?

Do you start by extracting the inline SQL code into a DAL? Do you start by removing the XHTML into templates?

I have done this in the past and extracted XHTML into templates first -- but this time it's proving more difficult due to the hundreds of $_SESSION, $GLOBAL and other variable dependencies in each.

I removed some of the SQL and at least now have established some kind of API and documentation and this allowed some reuse.

If you were confronted with such a task, where would you start and WHY? What are some caveats to look out for?

Please share your experiences.
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Refactoring or rewrite -- where do you start?

Post by alex.barylski »

The other important question I wanted to include was actually whether in your experience, do you prefer to begin refactoring by adding a index.php and implementing a front controller or whether sticking with the page controller design initially was easier.

I am thinking I will probably focus on extracting the SQL into DAL API, then extracting XHTML into templates. When each script has the process complete I would then focus on merging all scripts into separate action controllers and dispatching via a front.
User avatar
VirtuosiMedia
Forum Contributor
Posts: 133
Joined: Thu Jun 12, 2008 6:16 pm

Re: Refactoring or rewrite -- where do you start?

Post by VirtuosiMedia »

If I were to do it, I would (first save everything and then) probably use a framework and do what you suggested with the index.php page and the front controller. It gives you a starting point and anything that runs through there will be using the framework. Then, one script or page at a time, I would convert over to using the framework.
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Re: Refactoring or rewrite -- where do you start?

Post by pickle »

I usually do it in phases.
  1. Separate business and display logic on the same page - so move all the business logic to the top of the page, setting variables & such for the display logic below. This reduces the possible spots bugs can appear for a given page.
  2. Once that's all done, start moving functionality into libraries. At this point just cut down on duplication of code
  3. When that's done, move the template-type code into a template engine.
  4. Next, separate out the javascript into it's own files.
  5. Refactor as necessary into whichever patterns are appropriate.
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
User avatar
deejay
Forum Contributor
Posts: 201
Joined: Wed Jan 22, 2003 3:33 am
Location: Cornwall

Re: Refactoring or rewrite -- where do you start?

Post by deejay »

I'm having a similar dilemna with a site that I built that started off as a small project and has grown and grown. Due to deadlines etc it's all been built in procedural code but is now at a size where it desperatly needs to be OOP.

I'm using CodeIngniter on another project and would like to move it to that, but would want to just tackle a bit at a time when commisioned to make improvements to that whatever section it is that needs changing.


The problem though is that with codeIgniter uses a front controller, therefore wouuldn't the work need to be done in one go?

However I came across this wiki article http://codeigniter.com/wiki/Category:Ad ... ronScript/
which seems to indicate maybe I could force the URL info . ie

http://codeigniter.com/wiki/Category:Ad ... ronScript/

Code: Select all

$_SERVER['PATH_INFO'] = 'class/function/ID';
OR

Code: Select all

$_SERVER['REQUEST_URI'] = 'class/function/ID';


So I'm thinking it's then be possible to work push the seperate areas into the codeigniter framework. Eventually moving to the front_controller and knocking out the $_SERVER[] hacks.


Any views on this. I've not tried to do it yet, just that it makes sense to me in theory.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Refactoring or rewrite -- where do you start?

Post by josh »

alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Refactoring or rewrite -- where do you start?

Post by alex.barylski »

I have read about that before...not that page though.

While I have used a similar technique (essentially intercept requests and over ride them with my own implementation) this current project is quite a mess.

For starters, there are about 3 dozen PHP scripts in the docroot and several sub-domains:

registration.domain.com
login.domain.com
members.domain.com

Each of the scripts in the docroot are mirrored in the sub-domain directories (a PHP proxy script which just includes the original -- horrible for SEO -- duplicates are not a good thing) in addition to dozens of PHP scripts which are of their own purpose, like adding members, classifieds, etc.

There are several systems at play here, each simple in nature, complex in interface and discombobulating in codebase.

Contact manager
Classifieds
Coupons builder
Membership (ie: MySPACE)
Community (forums)

Each script in the entire codebase follows a transaction script approach with the HTML hardcoded with inline SQL, CSS and JS.

Each script also uses upwards of 50 globals at the start of every script, and every script then includes a 'framework.php' file which basically checks the environment, access control, etc and initializes the globals.

Examples of what the globals are:

1. SF_title = title of the page -- this is initialized in the framework.php file after it connects to the DB and looks up the script in a table and returns SEO keywords.
2. SF_protocol = The protocol the site is using. At one time the site used SSL after someone logged into the system
3. SF_locale = The domain name (which is switched for testing and live deployment)

The site is hardcoded to work with only two domains (domainloc.com and domainpro.com). Installing the site under any other domain will cause hundreds of errors.

The framework.php is well over 1000 lines of messy code, initializing globals, many of which I am sure are redundant or dead code now.

This file includes a globals.php which includes about a dozen functions, many of which are wrappers around string functions to emulate a VB environment, such as

Code: Select all

REPLACE = str_replace
There are no functions outside of these wrappers -- everything is inline.

I have tried refactoring many of the scripts:

1. Refactoring the inline SQL into a DAL to provide some kind of API -- while this cleaned up the code a bit it usually requires refactoring the entire script as code is often tightly coupled with the HTML.

Code: Select all

 
$res = odbc_exec('SELECT * FROM blah...');
 
while($tmp = odbc_fetch($res)){
  echo '<b>Some crap: </b>'.odbc_result($tmp, 'some)field');
  echo '<b>Some crap: </b>'.odbc_result($tmp, 'some)field');
  echo '<b>Some crap: </b>'.odbc_result($tmp, 'some)field');
  echo '<b>Some crap: </b>'.odbc_result($tmp, 'some)field');
  echo '<b>Some crap: </b>'.odbc_result($tmp, 'some)field');
}
 
// Do more crap with the ODBC connection
 
So while I can pull the SQL into a function and return a native PHP array I usually have to go over all the other code and make sure to use the array instead of the odbc_result() -- this actually speeds things up by about 1/10 of a second per page.

Once I had done this and documented the API I began attempting to extract the XHTML into a template layer and this is where things got really tricky.

The globals I mention above are used all throughout the HTML. Every href/src/whatever has it's URI dynamically generated with several variables:

SF_protocol
SF_locale
SF_suffix (TLD)

There are several other variables which you can use to compose the URI, such as $SF_root_url (which is everything but the protocol) as well as SF_root_dir -- which is used by includes. As well as SF_domain_pre which I am uncertain exactly what it does.

Because there are literally dozens and dozens of globals used in composing the HTML it makes it very difficult to just extract the XHTML as I need the templates to either:

1. Pass in the globals as template variables
2. Give templates access to globals

The alternative is to go through each line and remove the dependency on the globals as much as possible.

The template code is so complex and confusing, there is a tremensoud amount of duplication occuring, it is not uncommon for the template code too look like this:

Code: Select all

 
if($_SESSION['sess_uid'] != 0){
  echo 'Do something';
}
 
if($_SESSION['sess_uid'] > 0){
  echo 'Do something else';
}
 
if($_SESSION['sess_uid'] != 0 && $_SESSION['sess_blah'] !== 0){
  echo 'Do something again';
}
 
if($_SESSION['sess_uid'] != 0){
  echo 'Do something else -- whcih could be done in the first conditional test';
}
 
Not to mention nesting -- there is so much duplication caused by deep nesting and complex view/business logic/etc.

To top it all off, I am expected to improve the code while others work on it too...it took me 3 days to finally get the servers and each of our consoles configured so we could run the damn software locally, commit to a test server and implement batch files to update the live site wheneve the manager feels the code is where she expects it to be. SVN is a godsend in this scenario.

An important part of writing software is writing it in such as way that it facilitates change (ie: can be strangled later on). This software is not even remotely designed that way. The fact it's hardcoded to a domain made configurning the desktops to run WAMP and the software an extra special challenge.

Yesterday I started on re-working the classifieds section. Currently they are powered by a classifieds.php in the docroot and the management is handled by various scripts in the members.domain.com portion of the site.

I have essentially started from scratch, creating a 'classifieds' directory with it's own .htaccess, index.php, templates and functions. The script still needs to use the existing framework.php however as there is conditional checks that rely on the system state.

If the user is logged into the system, while browsing classifieds ads, they need to see an update button. There is no function to call to check for authentication, just a few session variables, which are initialized by framework.php. I have no idea where in framework.php all this happens so I just include it as until it works.

The problem with including any of the existing files, is the include files have a custom 'punter' check in the files to prevent direct access and I cannot remove those checks as they will break the original code if I do -- somehow the scripts in docroot are dependent on those checks passing. :banghead:

So I have made copies of those scripts in my local working copy prefixed with '_i.php' i for improved :P

So long as I don't commit these files to the repo, no one else is even the wiser until I have everything working, at whihc point I rename to original and commit and force every to update as well.

There are essentially two global includes (outside of framework and global) which are the header.php and footer.php.

When I included these in my own version, everything broke, because they also have that 'punter' check to prevent direct access. Short of finding the code which 'punts' I just made local copies prefixed with '_i.php' and include those instead.

I removed all variables from the template XHTML code and pass in only the very essentials (data from array's etc).

Everytime I update my working copy I use WinMerge to merge the changes made by others to header.php/footer.php into my local copy -- this is a time consuming process as the code is wildly different. For starters, my templates are strictly presentation logic, no inline SQL, etc.

To make matters worse, I have no access to the DB schema, so I cannot change anything at that level, which is probably a good thing.

I think at this point, it's probably best ot just incermentally improve the code one script at a time. :)

Any tools, tips, etc anyone care to share?

Unit testing it out of the question at this point, the DAL are extremely simple and just serve to centralize the SQL and remove it from the scripts/templates.

Each script acts as a page controller, which pulls on the DAL API and initializes the templates. $_SESSION variables are the only globals directly depended upon by the XHTML templates.

Interested in hearing your horror stories, experiences, etc???

I have seriously never seen code is such bad shape in my life, prior to this experience, I thought WordPress was bad, this source base just takes the cake. :P
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Refactoring or rewrite -- where do you start?

Post by josh »

I'd recommend Michael Feather's book, essentially it talks about getting legacy code under test
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Refactoring or rewrite -- where do you start?

Post by alex.barylski »

Ok so I bought the book and read it -- twice!!! Now what? :P
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Refactoring or rewrite -- where do you start?

Post by josh »

Serious?
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Refactoring or rewrite -- where do you start?

Post by alex.barylski »

Haha...no I was just kidding...my attempt at being funny :P
User avatar
Jenk
DevNet Master
Posts: 3587
Joined: Mon Sep 19, 2005 6:24 am
Location: London

Re: Refactoring or rewrite -- where do you start?

Post by Jenk »

Wrap existing functionality in tests, then only change what is required - maintaining the tests as I go. Making changes only where functional requirements demand is the most efficient way to go :)

http://www.amazon.com/Working-Effective ... 274&sr=8-1

A very good book on working with Legacy code. :)
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Refactoring or rewrite -- where do you start?

Post by josh »

( thats the Micheal Feathers book ;-) )
User avatar
Jenk
DevNet Master
Posts: 3587
Joined: Mon Sep 19, 2005 6:24 am
Location: London

Re: Refactoring or rewrite -- where do you start?

Post by Jenk »

Yes. I have a copy on my desk in front of me right now :)
Post Reply