Page 1 of 1

UTF 8 Help

Posted: Sat Jul 23, 2011 6:37 am
by Talon
Hey I'm trying to scrape a UTF 8 Page, but when I echo the results it print jibrish.
This is the code I'm using

Code: Select all

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8"/>
</head>
<?php 
header('Content-Type: text/html; charset=UTF-8');
$opts = array('http' => array('header' => 'Accept-Charset: UTF-8, *;q=0'));
$context = stream_context_create($opts);

$filename = "http://weather.walla.co.il";
file_get_contents($filename, false, $context);
preg_match('/<h1>(.*)<\/h1>/i', $filename, $title);
$title_out = $title[1];

preg_match('/<meta name="keywords" content="(.*)" \/> /i', $file_string, $keywords);
$keywords_out = $keywords[1];

preg_match('/<meta name="description" content="(.*)" \/> /i', $file_string, $description);
$description_out = $description[1];

preg_match_all('/<li><a href="(.*)">(.*)<\/a><\/li>/i', $file_string, $links);

?>

<p><strong>Title:</strong> <?php echo $title_out; ?></p>
<p><strong>Keywords:</strong> <?php echo $keywords_out; ?></p>
<p><strong>Description:</strong> <?php echo $description_out; ?></p>
<p><strong>Links:</strong> <em>(Name - Link)</em><br />
<?php
	echo '<ol>';
	for($i = 0; $i < count($links[1]); $i++) {
		echo '<li>' . $links[2][$i] . ' - ' . $links[1][$i] . '</li>';
	}
	echo '</ol>';
?>
</p>
</html>
Please advice

Re: UTF 8 Help

Posted: Sat Jul 23, 2011 10:58 am
by McInfo
The character set is Windows-1255, not UTF-8.

Re: UTF 8 Help

Posted: Sun Jul 24, 2011 8:38 am
by Talon
How do I change it?

Re: UTF 8 Help

Posted: Sun Jul 24, 2011 1:46 pm
by McInfo
Pick one:
  1. Change the headers, meta tag, etc. on your side to conform to Windows-1255.
  2. Wait for a future version of mb_convert_encoding() to support Windows-1255.
  3. Use something other than PHP's built-in functions to convert the encoding.

Re: UTF 8 Help

Posted: Tue Jul 26, 2011 1:14 pm
by Talon
I didn't understand can you please show me examples on the actual code.

Re: UTF 8 Help

Posted: Tue Jul 26, 2011 1:24 pm
by Apollo
Try one of these:

Code: Select all

$utf8_method1 = iconv( 'windows-1255', 'utf-8', $yourGibberishString );
$utf8_method2 = preg_replace( "/[\xE0-\xFA])/e", "chr(215).chr(ord(\${1})-80)", $yourGibberishString );

Re: UTF 8 Help

Posted: Wed Jul 27, 2011 12:38 am
by Talon
Can you please show me where to add it in my example above.

Thank You

Re: UTF 8 Help

Posted: Wed Jul 27, 2011 8:06 am
by Apollo
Talon wrote:Can you please show me where to add it in my example above.
Seems like trial & error approach.
Read this article on character encoding and then carefully reconsider what you're doing exactly.