Page 1 of 1
decode &#x
Posted: Mon Sep 23, 2013 12:56 pm
by cataIin
I use
file_get_contents to get contents from external address. Then some
strstr,
substrs and
strpos to get only what I want from retrieved text. Also, I use
strip_tags and
$final = preg_replace('/\s+/', ' ', $strip_tags);. All good, but I get:
Code: Select all
Bill Clinton, George W. Bush, and Tony Blair. The setting was elegant—the.
Now I need to decode characters like
,
and so on. So I tried with:
Code: Select all
$decode = htmlspecialchars_decode($final);
Same resut. Where is the problem?

Re: decode &#x
Posted: Mon Sep 23, 2013 1:50 pm
by requinix
htmlspecialchars_decode() only reverses what htmlspecialchars() could do: less-than, greater-than, and quotes. You want
html_entity_decode.
Re: decode &#x
Posted: Mon Sep 23, 2013 6:50 pm
by cataIin
requinix wrote:htmlspecialchars_decode() only reverses what htmlspecialchars() could do: less-than, greater-than, and quotes. You want
html_entity_decode.
Thank you for reply. Unfortunately, my problem was not solved.
I use:
Code: Select all
$url = @file_get_contents('http://some.address.com');
$start = strstr($url, '<body>');
$end = substr($start, 0, strpos($start, '</body>'));
$remove_tags = strip_tags($end);
$remove_spaces = preg_replace('/\s+/', ' ', $remove_tags);
$text = html_entity_decode($remove_spaces);
var_dump($text);
and what I get (just an example):
Code: Select all
Clinton’s presidency Clinton’s . “The
Where I'm wrong?
Re: decode &#x
Posted: Mon Sep 23, 2013 7:22 pm
by requinix
You need to specify an encoding that can support the characters you're trying to decode. Like UTF-8.
Code: Select all
$text = html_entity_decode($remove_spaces, ENT_QUOTES, "UTF-8");
Re: decode &#x
Posted: Mon Sep 23, 2013 8:27 pm
by cataIin
requinix wrote:You need to specify an encoding that can support the characters you're trying to decode. Like UTF-8.
Code: Select all
$text = html_entity_decode($remove_spaces, ENT_QUOTES, "UTF-8");
Many thanks!
