How to get the right charset/encoding?
Posted: Sat Mar 03, 2012 6:41 am
Hello, I am trying to parse the title from a Chinese website but I'm getting a wrong result. It seems like an encoding problem? What can I do about it?
I need to get the title, the text on the gray background: 我和哥哥的秘密花园
But instead it's outputting this: 脦脪潞脥赂莽赂莽碌脛脙脴脙脺禄篓脭掳
what's wrong?
I need to get the title, the text on the gray background: 我和哥哥的秘密花园
But instead it's outputting this: 脦脪潞脥赂莽赂莽碌脛脙脴脙脺禄篓脭掳
what's wrong?
Code: Select all
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><html>
<head>
<title>TEST</title>
<meta charset="gbk" />
</head>
<body>
<?php
$dom = new DomDocument;
libxml_use_internal_errors(true);
$am_link = "http://tieba.baidu.com/p/21993922";
$dom->loadHTMLFile($am_link);
libxml_clear_errors();
$xpath = new DomXpath($dom);
$nodes = $xpath->query('//div[@class="l_thread_title"]/descendant::h1[1]');
foreach ($nodes as $node)
{
echo $node->nodeValue, "\n";
echo "<br />";
}
?>
</body>
</html>