Best Practices For Clean Urls
Posted: Wed Jul 15, 2009 2:50 pm
Whenever I read posts related to clean urls, they always seem to be related to setting up apache with mod_rewrite to redirect all requests to a bootstrap file, typically index.php. While rewrite rules can get quite complex, there are a plethora of examples available that handle the most common use cases. Nobody really mentions what you have to change in your application in order to accomodate clean urls, however. I'd eventually like to turn this into a tutorial, but for now I'm posting the best practices I've found. If anyone else has found some useful information along the way, feel free to post and I will incorporate your comments into my final tutorial.
* * *
I decided to rewrite the url system for the in-house CMS built by the company I work for, Bright Bridge Studios. I was motivated in part becuase it had been on my list to do for a couple of months, and also in part because I had a paper for school I need to work on, and I was procrastinating
I wanted to do something really useful for our system that wouldn't take much time. However, I soon discovered that there is a lot more to using clean urls than mod_rewrite.
1. Relative urls.
The first issue we ran across is relative urls. In the existing site, every page was handled by either index.php or pages.php, both in the same folder. Thus, a url for an image, stylesheet, script, or other physical resource such as "images/header.jpg" would look in that same folder. With clean urls, however, a url like /about/ would break relative urls. To fix this, there are two options (that I know of): using a <base> tag, or always using absolute urls. We want our code to work no matter what folder or server it's dropped on, so (for now) relative urls are a must. I suspect some url generation could allow absolute urls to be used without harm.
2. Url generation
An initial requirement for us was to allow our site to work with or without clean urls enabled. If a page was accessed directly, ie pages.php, then the standard pages.php?tabid=1&pageid=2 would be generated. If a clean url was used, such as /tab/page/, the rewrite would resolve to our bootstrap which would enable clean urls to be generated. I had a nice little class that would spit out either type of url depending on a clean url flag. Things got really complicated when I started doing, for example, blog posts that didn't fit within our standard url scheme. Thus, I had to create separate methods to handle those specific use cases, and eventually I was passing so much information to these methods that I might as well have had if statements in the urls. The solution I found here was to handle standard urls in a different manner. If a clean url was http://www.site.com/site/blog/post/ then a standard url would be http://www.site.com/index.php?route=/site/blog/post/ and then all I had to do was determine whether to use $_GET['route'] or $_SERVER['REQUEST_URI']. All links could then be displayed like this: <a href="<?php echo $url->render('/site/blog/post'); ?>"> and would never need to know which url system I was using.
3. Link Anchors
Our designer uses back to top links in footers using link anchors, or href="#top". We found out after using the <base> tag that link anchors are broken. #top resolves to the url in the base tag, so if you have <base href="http://www.site.com/site/">, then #top links to /site/#top even if you are currently at /site/blog/post/. The solution to this is to paste the current page's name before the hash, so <a href="/site/blog/post/#top">.
4. Root directories
Sites aren't always in the site root. For instance, on our development box, we have a folder system of /clients/[client_name]/ that allows every site to be accessed through our own domain. Thus, we need a way to isolate the root directories from the actual url, ie /site/ from /blog/post/ in the examples above. With clean urls, the following line should do just that:
Here's what's happening:
Thus, any url can be created like this:
Because I don't want to do this with every url, I have the url generator that I mentioned earlier place $root_dir at the beginning of every url. So the above is simplified to:
As long as the $url object is available, I can print out urls that will work in any directory, on any server.
Note: It is imperative to not have an ending / at the end of the root dir. We started out doing it that way, and it caused a great deal of headache for us. However, the base tag DOES need an ending slash. If you have done this correctly, physical resources can be located using a relative url (because of the base tag) and clean url pages can be linked to using an absolute url (starting from wherever your bootstrap is contained).
5. Anything else?
* * *
I decided to rewrite the url system for the in-house CMS built by the company I work for, Bright Bridge Studios. I was motivated in part becuase it had been on my list to do for a couple of months, and also in part because I had a paper for school I need to work on, and I was procrastinating
1. Relative urls.
The first issue we ran across is relative urls. In the existing site, every page was handled by either index.php or pages.php, both in the same folder. Thus, a url for an image, stylesheet, script, or other physical resource such as "images/header.jpg" would look in that same folder. With clean urls, however, a url like /about/ would break relative urls. To fix this, there are two options (that I know of): using a <base> tag, or always using absolute urls. We want our code to work no matter what folder or server it's dropped on, so (for now) relative urls are a must. I suspect some url generation could allow absolute urls to be used without harm.
2. Url generation
An initial requirement for us was to allow our site to work with or without clean urls enabled. If a page was accessed directly, ie pages.php, then the standard pages.php?tabid=1&pageid=2 would be generated. If a clean url was used, such as /tab/page/, the rewrite would resolve to our bootstrap which would enable clean urls to be generated. I had a nice little class that would spit out either type of url depending on a clean url flag. Things got really complicated when I started doing, for example, blog posts that didn't fit within our standard url scheme. Thus, I had to create separate methods to handle those specific use cases, and eventually I was passing so much information to these methods that I might as well have had if statements in the urls. The solution I found here was to handle standard urls in a different manner. If a clean url was http://www.site.com/site/blog/post/ then a standard url would be http://www.site.com/index.php?route=/site/blog/post/ and then all I had to do was determine whether to use $_GET['route'] or $_SERVER['REQUEST_URI']. All links could then be displayed like this: <a href="<?php echo $url->render('/site/blog/post'); ?>"> and would never need to know which url system I was using.
3. Link Anchors
Our designer uses back to top links in footers using link anchors, or href="#top". We found out after using the <base> tag that link anchors are broken. #top resolves to the url in the base tag, so if you have <base href="http://www.site.com/site/">, then #top links to /site/#top even if you are currently at /site/blog/post/. The solution to this is to paste the current page's name before the hash, so <a href="/site/blog/post/#top">.
4. Root directories
Sites aren't always in the site root. For instance, on our development box, we have a folder system of /clients/[client_name]/ that allows every site to be accessed through our own domain. Thus, we need a way to isolate the root directories from the actual url, ie /site/ from /blog/post/ in the examples above. With clean urls, the following line should do just that:
Code: Select all
$root_dir = str_replace ($_SERVER['DOCUMENT_ROOT'], '', realpath('.'));Code: Select all
'/clients/client' = str_replace ('/var/www', '', '/var/www/clients/client');Code: Select all
<a href="<?php echo $root_dir . '/blog/post/'; ?>">Code: Select all
<a href="<?php echo $url->render('/blog/post/'); ?>Note: It is imperative to not have an ending / at the end of the root dir. We started out doing it that way, and it caused a great deal of headache for us. However, the base tag DOES need an ending slash. If you have done this correctly, physical resources can be located using a relative url (because of the base tag) and clean url pages can be linked to using an absolute url (starting from wherever your bootstrap is contained).
5. Anything else?