Page 1 of 1

Best Practices For Clean Urls

Posted: Wed Jul 15, 2009 2:50 pm
by allspiritseve
Whenever I read posts related to clean urls, they always seem to be related to setting up apache with mod_rewrite to redirect all requests to a bootstrap file, typically index.php. While rewrite rules can get quite complex, there are a plethora of examples available that handle the most common use cases. Nobody really mentions what you have to change in your application in order to accomodate clean urls, however. I'd eventually like to turn this into a tutorial, but for now I'm posting the best practices I've found. If anyone else has found some useful information along the way, feel free to post and I will incorporate your comments into my final tutorial.
* * *
I decided to rewrite the url system for the in-house CMS built by the company I work for, Bright Bridge Studios. I was motivated in part becuase it had been on my list to do for a couple of months, and also in part because I had a paper for school I need to work on, and I was procrastinating :) I wanted to do something really useful for our system that wouldn't take much time. However, I soon discovered that there is a lot more to using clean urls than mod_rewrite.

1. Relative urls.
The first issue we ran across is relative urls. In the existing site, every page was handled by either index.php or pages.php, both in the same folder. Thus, a url for an image, stylesheet, script, or other physical resource such as "images/header.jpg" would look in that same folder. With clean urls, however, a url like /about/ would break relative urls. To fix this, there are two options (that I know of): using a <base> tag, or always using absolute urls. We want our code to work no matter what folder or server it's dropped on, so (for now) relative urls are a must. I suspect some url generation could allow absolute urls to be used without harm.

2. Url generation
An initial requirement for us was to allow our site to work with or without clean urls enabled. If a page was accessed directly, ie pages.php, then the standard pages.php?tabid=1&pageid=2 would be generated. If a clean url was used, such as /tab/page/, the rewrite would resolve to our bootstrap which would enable clean urls to be generated. I had a nice little class that would spit out either type of url depending on a clean url flag. Things got really complicated when I started doing, for example, blog posts that didn't fit within our standard url scheme. Thus, I had to create separate methods to handle those specific use cases, and eventually I was passing so much information to these methods that I might as well have had if statements in the urls. The solution I found here was to handle standard urls in a different manner. If a clean url was http://www.site.com/site/blog/post/ then a standard url would be http://www.site.com/index.php?route=/site/blog/post/ and then all I had to do was determine whether to use $_GET['route'] or $_SERVER['REQUEST_URI']. All links could then be displayed like this: <a href="<?php echo $url->render('/site/blog/post'); ?>"> and would never need to know which url system I was using.

3. Link Anchors
Our designer uses back to top links in footers using link anchors, or href="#top". We found out after using the <base> tag that link anchors are broken. #top resolves to the url in the base tag, so if you have <base href="http://www.site.com/site/">, then #top links to /site/#top even if you are currently at /site/blog/post/. The solution to this is to paste the current page's name before the hash, so <a href="/site/blog/post/#top">.

4. Root directories
Sites aren't always in the site root. For instance, on our development box, we have a folder system of /clients/[client_name]/ that allows every site to be accessed through our own domain. Thus, we need a way to isolate the root directories from the actual url, ie /site/ from /blog/post/ in the examples above. With clean urls, the following line should do just that:

Code: Select all

$root_dir = str_replace ($_SERVER['DOCUMENT_ROOT'], '', realpath('.'));
Here's what's happening:

Code: Select all

'/clients/client' = str_replace ('/var/www', '', '/var/www/clients/client');
Thus, any url can be created like this:

Code: Select all

<a href="<?php echo $root_dir . '/blog/post/'; ?>">
Because I don't want to do this with every url, I have the url generator that I mentioned earlier place $root_dir at the beginning of every url. So the above is simplified to:

Code: Select all

<a href="<?php echo $url->render('/blog/post/'); ?>
As long as the $url object is available, I can print out urls that will work in any directory, on any server.

Note: It is imperative to not have an ending / at the end of the root dir. We started out doing it that way, and it caused a great deal of headache for us. However, the base tag DOES need an ending slash. If you have done this correctly, physical resources can be located using a relative url (because of the base tag) and clean url pages can be linked to using an absolute url (starting from wherever your bootstrap is contained).

5. Anything else?

Re: Best Practices For Clean Urls

Posted: Wed Jul 15, 2009 3:07 pm
by Christopher
I think one reason I don't do much URL generation (remember the URL class ;)) is that I use the <base> tag. I find it makes everything easier. If you don't use the <base> tag then you jump through lots of hoops.

In templates the links can just be:

Code: Select all

<a href="<a href="blog/post/">Posts</a>
That is certainly the easiest for designers, portability, etc. And everything just works.

Re: Best Practices For Clean Urls

Posted: Wed Jul 15, 2009 3:28 pm
by matthijs
Arborint and I have discussed the base tag before. I can see it's simplicity. But I still think that in a system you want to use in different situations, it shouldn't be a requirement. You know, maybe a designer comes by and drops in different html templates without the base tag. Or you want to be able to use different themes (again possibly without the base tag)

So if it would be possible in any way, I'd prefer to see a solution without relying on the base tag.

What do you want to know for your tutorial? Do you want to discuss best practices in general or more the technical side of it?

Re: Best Practices For Clean Urls

Posted: Wed Jul 15, 2009 3:33 pm
by Christopher
You can also make the base URL available in the templates, e.g.

Code: Select all

<a href="<a href="{base}blog/post/">Posts</a>
<a href="<a href="<?php echo $base; ?>blog/post/">Posts</a>

Re: Best Practices For Clean Urls

Posted: Wed Jul 15, 2009 3:40 pm
by allspiritseve
matthijs wrote:Arborint and I have discussed the base tag before. I can see it's simplicity. But I still think that in a system you want to use in different situations, it shouldn't be a requirement. You know, maybe a designer comes by and drops in different html templates without the base tag. Or you want to be able to use different themes (again possibly without the base tag)

So if it would be possible in any way, I'd prefer to see a solution without relying on the base tag.
Well, as I said in the post, it's either base tag or absolute urls. Using absolute urls with url generation would probably be the most ideal situation. Also, since you aren't using the base tag, then link anchors would work as usual.
matthijs wrote:What do you want to know for your tutorial? Do you want to discuss best practices in general or more the technical side of it?
I think there are plenty of posts discussing how to get mod_rewrite going. I want to cover what to do next, which probably gets a little technical but also includes best practices. I included all that I've come across so far, but I'm sure there are others. I haven't even touched on dealing with clean urls with a database.

Re: Best Practices For Clean Urls

Posted: Wed Jul 15, 2009 3:43 pm
by allspiritseve
arborint wrote:I think one reason I don't do much URL generation (remember the URL class ;)) is that I use the <base> tag. I find it makes everything easier. If you don't use the <base> tag then you jump through lots of hoops.

In templates the links can just be:

Code: Select all

<a href="<a href="blog/post/">Posts</a>
That is certainly the easiest for designers, portability, etc. And everything just works.
Maybe as the code matures we'll gravitate towards no url generation, but for now we really want some form of urls to work w/o mod_rewrite. Yours would not work without mod_rewrite, and it would only work if the site is in the root directory.

Re: Best Practices For Clean Urls

Posted: Wed Jul 15, 2009 3:51 pm
by matthijs
What wordpress does is have both clean urls and ugly urls. So if no htaccess/rewrite is used, you get urls like:
mysite.com/?p=356

I think that in this discussion it would be good to describe the different parts of the system first. And decide what it is you are looking for (the needs). There are a couple of different, partially independent and related parts/steps I can think of:

- deciding what URL scheme the website/application will have. How free do you want this to be? Do you accept restrictions?

- the implementation of being able to define that URL scheme in the system (example: setting the Routes in an ini file like in Zend framework)

- the actual mapping of requested URLs to the corresponding response

- generation of URLs. Do you want to code html templates by hand, do you want to use view helpers to generate urls, etc

Re: Best Practices For Clean Urls

Posted: Wed Jul 15, 2009 3:58 pm
by Christopher
allspiritseve wrote:Maybe as the code matures we'll gravitate towards no url generation, but for now we really want some form of urls to work w/o mod_rewrite. Yours would not work without mod_rewrite, and it would only work if the site is in the root directory.
The second style works without mod_rewrite, and I have found is still easy on designers, because you can simply change 'base' from 'mysite.com/blog/' to mysite.com/blog/index.php' and the clean URLs still work (with Apache).

Code: Select all

<a href="<a href="{base}blog/post/">Posts</a>
<a href="<a href="<?php echo $base; ?>blog/post/">Posts</a>
In general, I have found that <base> has an almost magical quality of making lots of problems all over the place go away.

Re: Best Practices For Clean Urls

Posted: Wed Jul 15, 2009 4:24 pm
by allspiritseve
arborint wrote:The second style works without mod_rewrite, and I have found is still easy on designers, because you can simply change 'base' from 'mysite.com/blog/' to mysite.com/blog/index.php' and the clean URLs still work (with Apache).
I meant this:
arborint wrote:<a href="<a href="blog/post/">Posts</a>
Pasting a $base variable before the url works if there no query parameters.

Re: Best Practices For Clean Urls

Posted: Wed Jul 15, 2009 4:52 pm
by Christopher
allspiritseve wrote:I meant this:
arborint wrote:<a href="<a href="blog/post/">Posts</a>
Yes, that only works with mod_rewrite. I was just showing how it can stay pretty simple for designers and still work without mod_rewirte.
allspiritseve wrote:Pasting a $base variable before the url works if there no query parameters.
Yes and there are a couple of options. I think a question is: who has the responsibility for parameters? Is it in the View/Template or in the Controller. Most of the time the designer can just add parameters. The Controller needs to add them for a couple of different cases, for example if it is persisting parameters across multiple requests. Or it is creating a somehow constructed or encoded URL, such as pagination links from a paginator, or links for a shopping cart or CMS to add/delete items. But in those cases should you jump straight to <a href="<?php echo $next_url; ?>"> if you want designers to use them?

Re: Best Practices For Clean Urls

Posted: Wed Jul 15, 2009 5:11 pm
by allspiritseve
Is it really that hard to ask a designer to write this:

Code: Select all

<a href="<?php echo $url->render('/page/url/'); ?>">
As opposed to:

Code: Select all

<a href="<?php echo $base; ?>/page/url">
Maybe I'm just biased because our designer doesn't mind having php here and there. I don't know.

Re: Best Practices For Clean Urls

Posted: Wed Jul 15, 2009 5:27 pm
by Christopher
allspiritseve wrote:Is it really that hard to ask a designer to write this:

Maybe I'm just biased because our designer doesn't mind having php here and there. I don't know.
No, not really. And as a general solution it is probably better because you can load up the $url object with whatever is needed, transparent to the person using the template.

But I also use HTML templates with 'designers' I don't trust (i.e. clients). And they do:

Code: Select all

<a href="{base}/page/url">
// or
<a href="{next_url}">
I guess I could use fancier template syntax.

But now we are to the question of what kinds of URL syntax is good for different types of people who might touch the templates.

PS - I still think that <base> is the most flexible solution for you because I think your problem from the 1st post is really that the site may not be in the document root of the webserver. This thing about <base> is that either <a href="blog/post/"> or <a href="<?php echo $url->render('blog/post'); ?>"> work with it. It does not limit absolute URLs, it just allows relative URLs.

Re: Best Practices For Clean Urls

Posted: Thu Jul 16, 2009 2:27 am
by matthijs
The bottom line is, that there are multiple options. I don't think there is a single best practice. Just like there is no single best template language. You can discuss endlessly why you prefer plain PHP or Smarty, in the end they both are good solutions, depending on the situation.

So if you want to describe "best practice" for clean urls, I think it's important to first describe the different situations and which process steps there are to decide which choices to make.

You could make a decision 'flow-diagram" with a few yes/no questions, like:
- do you want both clean urls and ugly urls to work?
- do you want to allow template editors to work in PHP?
- do you want full flexibility in the choice of URLs? (if yes, then use "slugs")
etc etc

And then after that you end up with a few situations with each a (possibly) different solution.