Page 1 of 1

decode &#x

Posted: Mon Sep 23, 2013 12:56 pm
by cataIin
I use file_get_contents to get contents from external address. Then some strstr, substrs and strpos to get only what I want from retrieved text. Also, I use strip_tags and $final = preg_replace('/\s+/', ' ', $strip_tags);. All good, but I get:

Code: Select all

Bill Clinton, George W. Bush, and Tony Blair. The setting was elegant—the.
Now I need to decode characters like

Code: Select all

 
,

Code: Select all

—
and so on. So I tried with:

Code: Select all

$decode = htmlspecialchars_decode($final);
Same resut. Where is the problem? :?

Re: decode &#x

Posted: Mon Sep 23, 2013 1:50 pm
by requinix
htmlspecialchars_decode() only reverses what htmlspecialchars() could do: less-than, greater-than, and quotes. You want html_entity_decode.

Re: decode &#x

Posted: Mon Sep 23, 2013 6:50 pm
by cataIin
requinix wrote:htmlspecialchars_decode() only reverses what htmlspecialchars() could do: less-than, greater-than, and quotes. You want html_entity_decode.
Thank you for reply. Unfortunately, my problem was not solved.
I use:

Code: Select all

$url = @file_get_contents('http://some.address.com');
$start = strstr($url, '<body>');
$end = substr($start, 0, strpos($start, '</body>'));
$remove_tags = strip_tags($end);
$remove_spaces = preg_replace('/\s+/', ' ', $remove_tags);
$text = html_entity_decode($remove_spaces);
var_dump($text);
and what I get (just an example):

Code: Select all

Clinton&#x2019;s presidency Clinton&#x2019;s . &#x201C;The
Where I'm wrong?

Re: decode &#x

Posted: Mon Sep 23, 2013 7:22 pm
by requinix
You need to specify an encoding that can support the characters you're trying to decode. Like UTF-8.

Code: Select all

$text = html_entity_decode($remove_spaces, ENT_QUOTES, "UTF-8");

Re: decode &#x

Posted: Mon Sep 23, 2013 8:27 pm
by cataIin
requinix wrote:You need to specify an encoding that can support the characters you're trying to decode. Like UTF-8.

Code: Select all

$text = html_entity_decode($remove_spaces, ENT_QUOTES, "UTF-8");
Many thanks! :)