Page 1 of 1

file_get_contents and UTF-8

Posted: Sun Apr 01, 2007 7:07 pm
by voltrader
When I pick up a page of yahoo.co.jp in UTF-8 using file_get_contents, it's somehow changed into charset=eucJP-win when it's echoed.

Code: Select all

$keyword=urlencode('日本');

$file="http://search.yahoo.co.jp/search?p=$keyword&ei=UTF-8&fr=top_v2&x=wrt";

$html=file_get_contents($file);

echo $html;
Not sure why this is!

Posted: Sun Apr 01, 2007 8:06 pm
by John Cartwright
Not the encoding guru around here, put perhaps iconv() might help

Posted: Sun Apr 01, 2007 8:45 pm
by aaronhall
You have to tell the browser in what content-type you're data is encoded, or else it will guess.

Code: Select all

header('Content-Type: text/html; charset=UTF-8');

Posted: Mon Apr 02, 2007 12:24 am
by voltrader
Thanks. Before I try any charset conversion, I tried setting the header as aaron
hall suggested above:

Code: Select all

header('Content-Type: text/html; charset=UTF-8');

$keyword=urlencode('東京');

$file="http://search.yahoo.co.jp/search?p=$keyword&ei=UTF-8&fr=top_v2&x=wrt";

$html=file_get_contents($file);

echo $html;
But no dice. Somehow the page is output as charset=eucJP-win even though http://search.yahoo.co.jp/search?p=%C5% ... p_v2&x=wrt is in UTF-8

:?:

Posted: Mon Apr 02, 2007 1:37 am
by dibyendrah
Sometimes, you have to put extra meta tag to tell browser that the encoding is UTF-8 even though you have added

Code: Select all

<?php header('Content-Type: text/html; charset=utf-8'); ?>
So putting the following statement may help :

Code: Select all

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Posted: Mon Apr 02, 2007 1:51 am
by dibyendrah
Okay here it goes the modified script which works for me :

Code: Select all

<?php

header('Content-Type: text/html; charset=euc-jp');

$keyword=urlencode('東京');

$file="http://search.yahoo.co.jp/search?p=$keyword&ei=UTF-8&fr=top_v2&x=wrt";

$html=file_get_contents($file);
?><html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=euc-jp">
</head>
<body>
<?php echo $html;  ?>
</body>
</html>

Posted: Thu May 10, 2007 3:55 pm
by voltrader
Ah, thank you for that. I will give it a try.