Page 1 of 1
file_get_contents and UTF-8
Posted: Sun Apr 01, 2007 7:07 pm
by voltrader
When I pick up a page of yahoo.co.jp in UTF-8 using file_get_contents, it's somehow changed into charset=eucJP-win when it's echoed.
Code: Select all
$keyword=urlencode('日本');
$file="http://search.yahoo.co.jp/search?p=$keyword&ei=UTF-8&fr=top_v2&x=wrt";
$html=file_get_contents($file);
echo $html;
Not sure why this is!
Posted: Sun Apr 01, 2007 8:06 pm
by John Cartwright
Not the encoding guru around here, put perhaps
iconv() might help
Posted: Sun Apr 01, 2007 8:45 pm
by aaronhall
You have to tell the browser in what content-type you're data is encoded, or else it will guess.
Code: Select all
header('Content-Type: text/html; charset=UTF-8');
Posted: Mon Apr 02, 2007 12:24 am
by voltrader
Thanks. Before I try any charset conversion, I tried setting the header as aaron
hall suggested above:
Code: Select all
header('Content-Type: text/html; charset=UTF-8');
$keyword=urlencode('東京');
$file="http://search.yahoo.co.jp/search?p=$keyword&ei=UTF-8&fr=top_v2&x=wrt";
$html=file_get_contents($file);
echo $html;
But no dice. Somehow the page is output as charset=eucJP-win even though
http://search.yahoo.co.jp/search?p=%C5% ... p_v2&x=wrt is in UTF-8

Posted: Mon Apr 02, 2007 1:37 am
by dibyendrah
Sometimes, you have to put extra meta tag to tell browser that the encoding is UTF-8 even though you have added
Code: Select all
<?php header('Content-Type: text/html; charset=utf-8'); ?>
So putting the following statement may help :
Code: Select all
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Posted: Mon Apr 02, 2007 1:51 am
by dibyendrah
Okay here it goes the modified script which works for me :
Code: Select all
<?php
header('Content-Type: text/html; charset=euc-jp');
$keyword=urlencode('東京');
$file="http://search.yahoo.co.jp/search?p=$keyword&ei=UTF-8&fr=top_v2&x=wrt";
$html=file_get_contents($file);
?><html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=euc-jp">
</head>
<body>
<?php echo $html; ?>
</body>
</html>
Posted: Thu May 10, 2007 3:55 pm
by voltrader
Ah, thank you for that. I will give it a try.