file_get_contents and UTF-8

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
voltrader
Forum Contributor
Posts: 223
Joined: Wed Jul 07, 2004 12:44 pm
Location: SF Bay Area

file_get_contents and UTF-8

Post by voltrader »

When I pick up a page of yahoo.co.jp in UTF-8 using file_get_contents, it's somehow changed into charset=eucJP-win when it's echoed.

Code: Select all

$keyword=urlencode('日本');

$file="http://search.yahoo.co.jp/search?p=$keyword&ei=UTF-8&fr=top_v2&x=wrt";

$html=file_get_contents($file);

echo $html;
Not sure why this is!
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Not the encoding guru around here, put perhaps iconv() might help
User avatar
aaronhall
DevNet Resident
Posts: 1040
Joined: Tue Aug 13, 2002 5:10 pm
Location: Back in Phoenix, missing the microbrews
Contact:

Post by aaronhall »

You have to tell the browser in what content-type you're data is encoded, or else it will guess.

Code: Select all

header('Content-Type: text/html; charset=UTF-8');
User avatar
voltrader
Forum Contributor
Posts: 223
Joined: Wed Jul 07, 2004 12:44 pm
Location: SF Bay Area

Post by voltrader »

Thanks. Before I try any charset conversion, I tried setting the header as aaron
hall suggested above:

Code: Select all

header('Content-Type: text/html; charset=UTF-8');

$keyword=urlencode('東京');

$file="http://search.yahoo.co.jp/search?p=$keyword&ei=UTF-8&fr=top_v2&x=wrt";

$html=file_get_contents($file);

echo $html;
But no dice. Somehow the page is output as charset=eucJP-win even though http://search.yahoo.co.jp/search?p=%C5% ... p_v2&x=wrt is in UTF-8

:?:
User avatar
dibyendrah
Forum Contributor
Posts: 491
Joined: Wed Oct 19, 2005 5:14 am
Location: Nepal
Contact:

Post by dibyendrah »

Sometimes, you have to put extra meta tag to tell browser that the encoding is UTF-8 even though you have added

Code: Select all

<?php header('Content-Type: text/html; charset=utf-8'); ?>
So putting the following statement may help :

Code: Select all

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
User avatar
dibyendrah
Forum Contributor
Posts: 491
Joined: Wed Oct 19, 2005 5:14 am
Location: Nepal
Contact:

Post by dibyendrah »

Okay here it goes the modified script which works for me :

Code: Select all

<?php

header('Content-Type: text/html; charset=euc-jp');

$keyword=urlencode('東京');

$file="http://search.yahoo.co.jp/search?p=$keyword&ei=UTF-8&fr=top_v2&x=wrt";

$html=file_get_contents($file);
?><html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=euc-jp">
</head>
<body>
<?php echo $html;  ?>
</body>
</html>
User avatar
voltrader
Forum Contributor
Posts: 223
Joined: Wed Jul 07, 2004 12:44 pm
Location: SF Bay Area

Post by voltrader »

Ah, thank you for that. I will give it a try.
Post Reply