Encoding problem using CURL to retrieve XML

XML, Perl, Python, and other languages can be discussed here, even if it isn't PHP (We might forgive you).

Moderator: General Moderators

Post Reply
joebarber
Forum Newbie
Posts: 1
Joined: Thu Aug 14, 2008 2:17 am

Encoding problem using CURL to retrieve XML

Post by joebarber »

Hi all,

I am working with a web service to which I send a request to a URL with a GET parameter of an XML string. I have been using CURL to get the XML reponse back from this URL, decoding it into a suitable format and then parsing it for use it my website using the simpleXML library as per the code below:

Code: Select all

$url = "http://www.example.com/cgi-bin/d3web_gzip.ssh?XML=$<TCOML>";
$url .= "<ExampleTags></ExampleTags></TCOML>";
 
$c = curl_init($url);
curl_setopt($c, CURLOPT_MUTE, 1);
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
$rawXML = curl_exec($c);
curl_close($c);
 
$fixedupXML = htmlspecialchars($rawXML);
$workableXML = htmlspecialchars_decode($fixedupXML);
$results = simplexml_load_string($workableXML);


This has all been working fine up until recently, when the provider of the web service informed me that they would have to switch the encoding of the XML they send back to me from UTF-8 to ISO-8859-1 as they needed their XML to support a wider character set, foreign characters etc. Unfortunately this seems to have broken the ability for my site to use the XML; after decoding the output and echoing out the XML, it just appears as a string of unreadable characters as below:

Code: Select all

?/?#k?*??-??X??9& :??bj?<?O*\?d ?S?R?Y{o?%BK????}4?|?n?????_!A?@x\???????`????Q?;????I????h?~R?ub??r?Y??w?L??H?>M?Z?W???Hk+???????e?_?Y?;??(??Q> ?ai(??Pf{??=?;g?,?????!af?]??6?=w?4?n??i.????9?|8?)oY4?m4??~is?pG???~??#??gJ????f??L3??JsA\?i> ??????Vx?x+???Q{}FiW>U~?E?}??

I thought that this would be as simple as using one of the UTF-8 encode methods built into PHP on the response, but no luck... I have seemingly tried all the encoding/decoding methods I could find and although it seems to make a difference to the output, it is all still unreadable characters to me. It looks like an encoding problem to me; perhaps I am missing something elsewhere on my page to work with this new encoding?

An example of the XML response I am getting from the web service is given below; the only thing that has changed about it is the encoding attribute from UTF-8.


Code: Select all

<?xml version="1.0" encoding="ISO-8859-1"?>
<TCOML version="3.0" sess="1483650066B.0">
<Availability>
<Line count="1">
<ExampleLine>x</ExampleLine>
<ExampleLine>y</ExampleLine>
<ExampleLine>z</ExampleLine>
</Avline>
</Availability>
</TCOML>
 

Many thanks in advance for taking the time to look at this issue and hopefully help me out with finding a solution!

Joe
User avatar
ghurtado
Forum Contributor
Posts: 334
Joined: Wed Jul 23, 2008 12:19 pm

Re: Encoding problem using CURL to retrieve XML

Post by ghurtado »

I think you have to specify the character set when using http://www.php.net/htmlspecialchars
Post Reply