Page 1 of 1

$_GET array and urlencoded ampersands

Posted: Thu Sep 10, 2009 5:48 pm
by bubblesnout
Firstly, Hi everybody here! I hope to hang around here, because I'm starting to develop in PHP more and more these days. Anyway, here's my query.

I'm using values from a database to construct a url with a querystring, being used as a link on a page. Because this value may contain things such as an ampersand (&), I figured I would be right in using urlencode(), or even better, rawurlencode(). Here's an example of the outputted HTML:

<a href="?page=music&group=band&value=Conor%20Oberst%20%26%20The%20Mystic%20Valley%20Band">Conor Oberst & The Mystic Valley Band</a>

It looks as though the 'value' has been encoded perfectly, with the ampersand being replaced with %26. However my problem is this. Even with this encoded url, PHP is splitting the string into the $_GET array at the %26. From that url, here's an output of print_r

Code: Select all

Array
(
    [page] => music
    [group] => band
    [value] => Conor Oberst
    [The_Mystic_Valley_Band] =>
)
As far as I've read, this is NOT how php is supposed to handle it. Although it is doing this both on my local WAMP installation, and also on my Unix server running apache.

Any thoughts?

Re: $_GET array and urlencoded ampersands

Posted: Thu Sep 10, 2009 7:04 pm
by SimonMayer
Please could you post your code?
I echo $_GET["value"] and I have "Conor Oberst & The Mystic Valley Band"

Re: $_GET array and urlencoded ampersands

Posted: Thu Sep 10, 2009 7:31 pm
by bubblesnout
It's not specific to my code at all, but here's an example that replicates the problem.

Create a file called index.php. Inside this file, have the following:

Code: Select all

<html>
<head><title>Test page</title></head>
<body>
<?php
print_r($_GET);
?>
<br /><br />
<a href="index.php?value=<?php echo rawurlencode("Jack & Jill"); ?>">Click Here</a>
</body>
</html>
First, look at the source of the page that is displayed, and the <a> element will look like:

Code: Select all

<a href="index.php?value=Jack%20%26%20Jill">Click Here</a>
It's replaced the spaces with %20, and the & with %26. Great!

Now click on that link, which will obviously take you to "index.php?value=Jack%20%26%20Jill". Look at the contents of the $_GET array up the top of the page. What I receive here is

Code: Select all

Array
(
    [value] => Jack 
    [Jill] =>
)
Whereas what I'd expect to have is:

Code: Select all

Array
(
    [value] => Jack & Jill
)

Re: $_GET array and urlencoded ampersands

Posted: Thu Sep 10, 2009 7:42 pm
by SimonMayer
I think this may be version specific or down to an ini setting on your server.
This is working fine for me: http://ribbontree.co.uk/forumtest/get.php

Re: $_GET array and urlencoded ampersands

Posted: Thu Sep 10, 2009 7:46 pm
by bubblesnout
Yeah I thought that might be the case, but I'm not sure what to look for. I find it a bit weird that this is happening on both my local WAMP development installation, and also on my Unix server running a pretty much default Apache.

Re: $_GET array and urlencoded ampersands

Posted: Thu Sep 10, 2009 8:02 pm
by SimonMayer
I know this might still fail, as you are including an ampersand in & but what happens if you try:

Code: Select all

rawurlencode("Jack & Jill")
?

Re: $_GET array and urlencoded ampersands

Posted: Thu Sep 10, 2009 8:02 pm
by John Cartwright
Assuming your using a browser, the entities are parsed by your browser prior to sending the url, which is why you are seeing the same result across different environments.

Instead, try using the more friendly urlencode() in place of rawurlencode()

Re: $_GET array and urlencoded ampersands

Posted: Thu Sep 10, 2009 9:01 pm
by bubblesnout
John Cartwright wrote:Assuming your using a browser, the entities are parsed by your browser prior to sending the url, which is why you are seeing the same result across different environments.

Instead, try using the more friendly urlencode() in place of rawurlencode()
I've tried urlencode() as well (in fact I tried that first), and the same result. The only difference I noticed was that urlencode() encoded spaces as a +, whereas rawurlencode() encoded them as %20.
I've tried in both Firefox and Internet Explorer, same result from both locations. If I look at the source behind the page in either browser, the url in the href of the <a> element is exactly how I expect it to look (encoded using %26 in place of the &). If I click the link, the URL in my browser reads "index.php?value=Jack%20%26%20Jill", it's at the point where PHP is reading these values into the $_GET array that it seems to be treating %26 as an ampersand.
SimonMayer wrote:I know this might still fail, as you are including an ampersand in & but what happens if you try:

Code: Select all

rawurlencode("Jack & Jill")
?
Same deal really, except it ends up being "index.php?value=Jack%20%26amp%3B%20Jill". The & is being encoded as %26 still, and the semicolon is encoded as %3B.

Thanks for the ideas guys.

Re: $_GET array and urlencoded ampersands

Posted: Thu Sep 10, 2009 10:42 pm
by McInfo
I'm curious what is in $_SERVER['QUERY_STRING'].

Code: Select all

<?php
if (isset($_GET['value'])) {
    header('Content-Type: text/plain');
    echo $_SERVER['QUERY_STRING'];
} else {
    header('Content-Type: text/html');
    ?><a href="?value=Jack%20%26%20Jill">Continue</a><?php
}
?>
Edit: This post was recovered from search engine cache.

Re: $_GET array and urlencoded ampersands

Posted: Thu Sep 10, 2009 10:54 pm
by bubblesnout
In that case, $_SERVER['QUERY_STRING'] equates to:
"value=Jack & Jill"

Re: $_GET array and urlencoded ampersands

Posted: Thu Sep 10, 2009 10:59 pm
by McInfo
So, after you click the link, your address bar contains "%26" but $_SERVER['QUERY_STRING'] contains "&"? Very strange...

Edit: This post was recovered from search engine cache.

Re: $_GET array and urlencoded ampersands

Posted: Thu Sep 10, 2009 11:29 pm
by McInfo
I suppose some more diagnostics couldn't hurt.

Code: Select all

var_dump(ini_get('auto_prepend_file'));
var_dump(ini_get('arg_separator.output'));
var_dump(ini_get('arg_separator.input'));
Is there a chance that Apache is rewriting the URL?

Edit: This post was recovered from search engine cache.

Re: $_GET array and urlencoded ampersands

Posted: Thu Sep 10, 2009 11:58 pm
by bubblesnout
McInfo wrote:So, after you click the link, your address bar contains "%26" but $_SERVER['QUERY_STRING'] contains "&"? Very strange...
Yeah that's correct... I thought it was strange too.
McInfo wrote:I suppose some more diagnostics couldn't hurt.

Code: Select all

var_dump(ini_get('auto_prepend_file'));
var_dump(ini_get('arg_separator.output'));
var_dump(ini_get('arg_separator.input'));
Is there a chance that Apache is rewriting the URL?
The output of those diagnostics are:

string(0) "" string(5) "&" string(1) "&"

The server I'm using is basically a brand new CentOS server, and it's running the pre-installed apache. I've done no customisations to it at all really... Definitely haven't touched anything to do with url rewriting.

Here's an interesting thing. I just tried creating a brand new file, one directory up from where my site is on my server. This contains the following (previous code I used on an existing page to test):

Code: Select all

<?php
if (isset($_GET['value'])) {
    header('Content-Type: text/plain');
    echo $_SERVER['QUERY_STRING'];
    echo "<br /><br />";
    print_r($_GET);
} else {
    header('Content-Type: text/html');
    ?><a href="?value=Jack%20%26%20Jill">Continue</a><?php
}
?>
Surprisingly, it works as expected. ie. $_GET contains 'value', which equals 'Jack & Jill'. Now I'm terribly confused.

Ok, I just played around some more with the pages I was testing it on. I'm using a SMF Forum along with the website, and I'm incorporating the authentication system it uses, which is made really easy by including SSI.php from the forums directory. I took the reference to this file out, commented out my authentication code, and voila... Now things are playing happily.
So the problem lies there, somewhere. There must be some sort of setting in the SMF Forums authentication system that is causing this problem. Time to investigate.

Re: $_GET array and urlencoded ampersands

Posted: Fri Sep 11, 2009 12:54 am
by McInfo
It appears that the Simple Machines Forum does some things to the query string in <SMF install dir>/Sources/QueryString.php.

Edit: This post was recovered from search engine cache.

Re: $_GET array and urlencoded ampersands

Posted: Fri Sep 11, 2009 1:14 am
by bubblesnout
Yeah, I've dug a little bit into it, but I don't really want to muck about with SMF's code, so I've asked the question on their forums, hopefully they can help a little over there. Thanks so much for your help thus far, if anybody has any experience with SMF in this sense, I'd love to hear any ideas.