Page 1 of 1
cURL and XML/XSL
Posted: Wed Aug 15, 2007 11:04 am
by TheMoose
Recently wrote a mini-script to try to fetch some XML data from a remote site. When I used file_get_contents() and cURL functions, both returned transformed XHTML (the XML page I'm requesting has an XSL stylesheet associated to it). Right now I'm using a socket to manually send a request to get the raw XML, as I want to use it for some data processing, other stuff, etc. Is there a way to do this with cURL, or does it process the XSL automatically every time?
On a side note, if I view the actual source code of the page (both FF and IE), it's the XML I want. Why would cURL/file_get_contents() return HTML?
Posted: Wed Aug 15, 2007 11:08 am
by miro_igov
This is not possible. Show the URL which generates the XML.
Posted: Wed Aug 15, 2007 11:09 am
by TheMoose
http://armory.worldofwarcraft.com/guild ... =Flash+Mob
EDIT: Check out
this test page
The code for the above link is exactly:
Code: Select all
<?php
$output = file_get_contents("http://armory.worldofwarcraft.com/guild-info.xml?r=Eredar&n=Flash+Mob");
?>
<textarea rows=20 cols=100>
<? echo htmlspecialchars($output); ?>
</textarea>
[offtopic]
Yes, I know, it's WoW

. I'm an addict!
[/offtopic]
Posted: Wed Aug 15, 2007 11:16 am
by miro_igov
And can you please post your code? I think it it is something with the GET parameters and the url encoding.
Posted: Wed Aug 15, 2007 11:23 am
by TheMoose
What I'm using now to get the raw XML (this works):
Code: Select all
$config['SERVER'] = "Eredar";
$config['GUILD'] = "Flash Mob";
// snip...
$fs = fsockopen("armory.worldofwarcraft.com", 80, $errno, $errstr, 15);
if(!$fs)
echo "($errno) $errstr";
else
{
$td = "r=" . urlencode($config['SERVER']) . "&n=" . urlencode($config['GUILD']);
$outg = "GET /guild-info.xml?$td HTTP/1.0\r\n";
$outg .= "Host: armory.worldofwarcraft.com\r\n";
$outg .= "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)\r\n";
$outg .= "Connection: close\r\n\r\n";
fwrite($fs, $outg);
while (!feof($fs)) {
$data .= fgets($fs, 128);
}
fclose($fs);
}
Posted: Wed Aug 15, 2007 11:29 am
by miro_igov
How about yout cURL script which does not return the XML ?
Posted: Wed Aug 15, 2007 11:52 am
by TheMoose
Updated the test link to include both file_get_contents() and cURL methods. I included the header with cURL just to see the result from the request.
Exact code of the test page:
Code: Select all
<?php
$output = file_get_contents("http://armory.worldofwarcraft.com/guild-info.xml?r=Eredar&n=Flash+Mob");
?>
file_get_contents():<br>
<textarea rows=20 cols=100>
<? echo htmlspecialchars($output); ?>
</textarea>
<?
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, "http://armory.worldofwarcraft.com/guild-info.xml?r=Eredar&n=Flash+Mob");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_HEADER, 1);
$outcurl = curl_exec($curl);
curl_close($curl);
?>
<br>
cURL:<br>
<textarea rows=20 cols=100>
<? echo htmlspecialchars($outcurl); ?>
</textarea>
Posted: Thu Aug 16, 2007 5:07 pm
by TheMoose
Any ideas? It's not a pressing issue as I have it working with sockets, just would like to know for future reference.
Thanks for taking a look [s]tho[/s] through Miro.
Posted: Fri Aug 17, 2007 3:30 am
by miro_igov
For me it looks fine and should get what you see in "View Source"
Posted: Sun Aug 19, 2007 8:45 am
by volka
You have to send a user-agent string or the wow webserver will send the transformed xml->html document.
Code: Select all
$context = stream_context_create(
array('http'=>array(
'user_agent'=>'User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
))
);
$output = file_get_contents("http://armory.worldofwarcraft.com/guild-info.xml?r=Eredar&n=Flash+Mob", false, $context);
echo $output;