Mod rewrite for special character – <space>

XML, Perl, Python, and other languages can be discussed here, even if it isn't PHP (We might forgive you).

Moderator: General Moderators

Post Reply
urnetmate
Forum Commoner
Posts: 27
Joined: Wed Sep 27, 2006 1:09 am

Mod rewrite for special character – <space>

Post by urnetmate »

hello,

The url with special character (–) is not redirecting to a specified page in rule.
I have written a htacces rule like:

Code: Select all

RewriteRule ^category/this-is-a–test-page-1.html  http://www.test-site.com/category/this-is-a%e2%80%93test-page-1.html [QSA]
The another issue is with the <space> in the url like:

Code: Select all

RewriteRule ^search.php?keyword=abc xyz  http://www.test-site.com/search.php?keyword=abc xyz [L,R=301]
Waiting for the reply.
Thanks.
Last edited by urnetmate on Thu Dec 24, 2009 4:57 am, edited 3 times in total.
User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: Mod rewrite for special character – <space>

Post by Apollo »

urnetmate wrote:The url with special character (–) is not redirecting to a specified page in rule.
I have written a htacces rule like:

Code: Select all

RewriteRule ^category/this-is-a–test-page-1.html  http://qa1.dailyglow.com/category/this-is-a%e2%80%93test-page-1.html [QSA]
The pattern as it is displayed here, is correct. But exactly how have you saved this .htaccess file? (more specifically: does your editor save UTF-8 encoded files?)

The thing is, that special dash between 'a' and 'test' (– instead of - , may depend on your font/browser/OS if you can see the difference) is not an ASCII character.
URLs are passed to rewrite_mod in UTF-8 encoding, and .htaccess is expected to be UTF-8 as well.

Try this instead:

Code: Select all

RewriteRule ^category/this-is-a\xE2\x80\x93test-page-1.html http://qa1.dailyglow.com/category/this- ... age-1.html [QSA]
The special dash in the rule pattern is now replaced with its UTF-8 representation using ascii characters only.

(btw there's more wrong with your rewrite rule, for example did you also intend category/this-is-a–test-page-1Xhtml to be redirected, since you use a . before html?)
The another issue is with the <space> in the url like:

Code: Select all

RewriteRule ^search.php?keyword=abc xyz  http://www.test-site.com/search.php?keyword=abc xyz [L,R=301]
1. RewriteRule does not work on the part after ? (the query parameter part is not considered part of the main URL here)

2. If you use a rule like

Code: Select all

RewriteRule ^abc def ghi
how is rewrite_mod supposed to make the difference between redirecting 'abc' to 'def ghij', or redirecting 'abc def' to 'ghi' ?

It isn't. If you want to include spaces in URLs to redirect (which is a bad idea to begin with), use \x20 instead of a literal space character (or \s to cover any whitespace char, also tabs etc).
Furthermore, you gotta express spaces in the resulting URL as %20 instead of literal space characters.

But, really, deliberately using non-ascii hard-to-distinguish dashes and spaces in your URLs is asking for trouble and giving your visitors a hard time.
urnetmate
Forum Commoner
Posts: 27
Joined: Wed Sep 27, 2006 1:09 am

Re: Mod rewrite for special character – <space>

Post by urnetmate »

Thanks for the reply.

I am using the rules in .htaccess file.

The character encoding set is ISO-8859.

I have added this at the top of .htaccess file.

Code: Select all

AddDefaultCharset utf-8
Thus the first rule is not working.
User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: Mod rewrite for special character – <space>

Post by Apollo »

urnetmate wrote:I am using the rules in .htaccess file.
Yes, but the question is, how (i.e. using which encoding) did your editor save that .htaccess file.

Try putting some test.php in the same dir with this content:

Code: Select all

<?php
$a = file('.htaccess');
foreach($a as $s) print($s.'<br>'.preg_replace('/(\w{2})/','\\1 ',bin2hex($s)).'<hr>');
?>
This should show a hex dump of your .htaccess file.
If the rule with the special char contains a '96' byte for the dash, it's stored in Ansi, which is wrong.
The character encoding set is ISO-8859.
I don't think so, because the – (the 'special dash') can't even be represented at all in ISO-8859-1 :)
Post Reply