Page 1 of 1

mod_rewrite and url parsing

Posted: Sat Jan 09, 2010 4:26 pm
by nic
Hi - I'm trying to get mod_rewrite to parse a friendly looking url into two parts; a filename, and a request string. I have the line in my apache config:

Code: Select all

RewriteRule (^(?!index\.php)[^/]*)(?:/(.*))? $1.php?request=$2 [L]
but am getting an error that makes me think the regex is broken. What I'm trying to say is:

Code: Select all

(^(?!index\.php)[^/]*)
take all the characters up until the first / or the end of the string, ignoring the string 'index.php', and dump the result into $1. And then

Code: Select all

(?:/(.*))?
optionally, look for a '/' followed by a string of any characters and dump it into $2.

Then rebuild the url to be $1.php?request=$2.

If I take out the last '?', leaving (^(?!index\.php)[^/]*)(?:/(.*)), the whole thing works if the second portion of the expression is satisfied in some way. So: 'http://host.com/page/23' will come out as 'http://host.com/page.php?request=23', exactly as I want. But if I have just 'http://host.com/page' it breaks and gives me an "500 Internal Server Error"!

What am I doing wrong?


Thanks

Re: mod_rewrite and url parsing

Posted: Mon Jan 25, 2010 11:17 am
by ridgerunner
nic wrote:Hi - I'm trying to get mod_rewrite to parse a friendly looking url into two parts; a filename, and a request string. I have the line in my apache config:

Code: Select all

RewriteRule (^(?!index\.php)[^/]*)(?:/(.*))? $1.php?request=$2 [L]
but am getting an error that makes me think the regex is broken. What I'm trying to say is:

Code: Select all

(^(?!index\.php)[^/]*)
take all the characters up until the first / or the end of the string, ignoring the string 'index.php', and dump the result into $1. And then

Code: Select all

(?:/(.*))?
optionally, look for a '/' followed by a string of any characters and dump it into $2.

Then rebuild the url to be $1.php?request=$2.

If I take out the last '?', leaving (^(?!index\.php)[^/]*)(?:/(.*)), the whole thing works if the second portion of the expression is satisfied in some way. So: 'http://host.com/page/23' will come out as 'http://host.com/page.php?request=23', exactly as I want. But if I have just 'http://host.com/page' it breaks and gives me an "500 Internal Server Error"!

What am I doing wrong?


Thanks
The sub expression: '(?:/(.*))?' contains a second capturing group but this is located inside a non-capturing group that is optional. i.e. The overall regex can succeed even if the second capture group does not participate in the match. And if the second capture group does not participate in the match, then the $2 reference in the replacement string is not defined, which causes an apache error if you try to use it (as you are doing here). The solution is to only refer to variables that actually exist within your replacement string! With your regex, removing the ? optional specifier fixes the error.

But there are other problems with your regex as well which cause it to match more than is needed. (i.e. the '(.*)' is merrily grabbing everything it can get its hands on, including extra '/' slashes, which is obviously not what you want here. Your regex correctly matches 'http://host.com/page/23', but it also erroneously matches other files that you did not intend such as 'http://host.com/page/23/two/three/four/ ... e#fragment') Beware the dot-star!

Lets develop an improved version which does precisely what you need. Here are your requirements as I understand them:
1. Must not match index.php (or any other real file or directory on your web server)
2. If the URL consists of just two paths, and this path does not actually exist in the file system, then rewrite a new URL substituting the first path for the base filename and the second path for a 'request' query string value like so: BEFORE REWRITE: 'http://host.com/page/23' AFTER REWRITE: 'http://host.com/page.php?request=23'

Here we go:

Code: Select all

RewriteEngine on
 
RewriteCond %{REQUEST_FILENAME}     !-f
RewriteCond %{REQUEST_FILENAME}     !-d
RewriteCond %{DOCUMENT_ROOT}/$1.php  -f
RewriteRule ^/?([^/]+)/([^/]+)/?$ $1.php?request=$2
What this rewrite rule says is:
If the URL consists of exactly two paths (path1/path2), and the URL is not an actual file or directory existing in the server file system, and if there exists a real .php file in the apache document root directory having its base name equal to the first path in the URL (i.e. /path1.php), then rewrite the URL to point to the /path1.php file and pass it the second path as a 'request' query string value like so: /path1.php?request=path2. The URL may optionally have a leading '/' slash (if the rewrite rule is in the main httpd.conf - URLs in .htaccess do not have a leading slash) and may also have an optional trailing slash which is discarded.

A little late, but I hope this helps! :)