Page 1 of 1

Percent encoding/decoding help

Posted: Tue Dec 08, 2009 5:28 pm
by Martoon
I'm a little confused on percent encoding and decoding on webpages, and not getting expected results.

For example, this Wikipedia article. If you follow that link, the URL you end up with in the browser, and the title of the article, look like "Guiding Light (1960–1969)". However, if I run the following script:

Code: Select all

<?php
$url = "http://en.wikipedia.org/wiki/Guiding_Light_(1960%E2%80%931969)";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec($ch);
$loc = strpos($content, "<title>") + strlen("<title>");
$locEnd = strpos($content, "</title>", $loc);
$title = substr($content, $loc, $locEnd - $loc);
curl_close($ch);
echo urldecode($url).'<br>';
echo $title.'<br>';
?>
 
The output I get looks like this:

Code: Select all

http://en.wikipedia.org/wiki/Guiding_Light_(1960–1969)
Guiding Light (1960–1969) - Wikipedia, the free encyclopedia
The urldecode() of the URL, and even the text grabbed directly from the title tag in the retrieved HTML, give me some very different characters in place of the long hyphen character I see in the web browser when I go to the URL.

Can someone explain this to me? For example, how would I modify the script above so it echos the proper long hyphens?

Re: Percent encoding/decoding help

Posted: Tue Dec 08, 2009 7:40 pm
by requinix
You need to change the encoding of your HTML to UTF-8.

Code: Select all

header("Content-Type: text/html; charset=utf-8");

Code: Select all

<meta http-equiv="content-type" content="text/html; charset=utf-8">
Either/both of those should work.

Re: Percent encoding/decoding help

Posted: Tue Dec 08, 2009 9:49 pm
by Martoon
Thank you! :D