Percent encoding/decoding help
Posted: Tue Dec 08, 2009 5:28 pm
I'm a little confused on percent encoding and decoding on webpages, and not getting expected results.
For example, this Wikipedia article. If you follow that link, the URL you end up with in the browser, and the title of the article, look like "Guiding Light (1960–1969)". However, if I run the following script:
The output I get looks like this:
The urldecode() of the URL, and even the text grabbed directly from the title tag in the retrieved HTML, give me some very different characters in place of the long hyphen character I see in the web browser when I go to the URL.
Can someone explain this to me? For example, how would I modify the script above so it echos the proper long hyphens?
For example, this Wikipedia article. If you follow that link, the URL you end up with in the browser, and the title of the article, look like "Guiding Light (1960–1969)". However, if I run the following script:
Code: Select all
<?php
$url = "http://en.wikipedia.org/wiki/Guiding_Light_(1960%E2%80%931969)";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec($ch);
$loc = strpos($content, "<title>") + strlen("<title>");
$locEnd = strpos($content, "</title>", $loc);
$title = substr($content, $loc, $locEnd - $loc);
curl_close($ch);
echo urldecode($url).'<br>';
echo $title.'<br>';
?>
Code: Select all
http://en.wikipedia.org/wiki/Guiding_Light_(1960–1969)
Guiding Light (1960–1969) - Wikipedia, the free encyclopediaCan someone explain this to me? For example, how would I modify the script above so it echos the proper long hyphens?