Problem: How to use cURL to retrieve pages from a site that uses ASP.NET and AJAX. This applies to non ASP pages as well, but there are some special things related to ASP and AJAX.
Solution:
1. Install Fiddler2 http://www.fiddler2.com
2. Install FiddlerCap http://www.fiddler2.com/fiddlercap/
- Follow the fiddlercap instructions carefully
3. Start a fiddlercap session and capture the POST data being sent through a normal browser request (not using your cURL script yet). When you've done that, stop the capture and save it. It's best if you have no other browser tabs open just to make it easier.
4. Open the capture in Fiddler2. If you go through the list of web sessions, watch the right panel and when you see the POST data appear in the body section on the right, then make a note of all the fields that are being sent to the server.
When dealing with ASP.NET & AJAX, there are some special things to watch out for. In particular you will find several hidden fields that start with a double underscore __EVENTTARGET for example. Two of these hidden fields were especially problematic. 1) __VIEWSTATE 2) __EVENTVALIDATION. In order to get these variables correct, you will have to capture the initial page load and turn all the form fields in to variables. You will find the code for this below.
In order to get the __VIEWSTATE and __EVENTVALIDATION variables, you have to load the form once, parse out those variables and then use them in your curl function. So now we've bypassed all the javascript junk and we're submitting just an html form.
The code you see below gets me past the initial form and returns data. It's not formatted or pretty but for my purposes, that's fine. I'm going to put pull data out of the results and do something else with it. I don't actually need to write it to the screen. Use at your own risk.
Code: Select all
//turn off error reporting for production
error_reporting(6138);
ini_set('display_errors', '1');
// grab HTML form....
$htmldoc = new DOMDocument;
$htmldoc->loadHTMLFile('http://72.240.45.198/Search.aspx');
$forms = $htmldoc->getElementsByTagName("form");
$inputs = $forms->item(0)->getElementsByTagName("input");
//stick the form variables in an array
$variables = array();
foreach ($inputs as $input) {
$name = $input->getAttribute("name");
array_push($variables, $input->getAttribute("value"));
};
//begin curl function
function curl_download($Url, $post_string){
// is cURL installed yet?
if (!function_exists('curl_init')){
die('Sorry cURL is not installed!');
}
// OK cool - then let's create a new cURL resource handle
$ch = curl_init();
// Now set some options (most are optional)
// Set URL to download
curl_setopt($ch, CURLOPT_URL, 'http://target-web-site.com/Search.aspx');
// Set POSTFIELDS
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string);
// User agent
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)");
// Include header in result? (0 = yes, 1 = no)
curl_setopt($ch, CURLOPT_HEADER, 0);
// Should cURL return or print out the data? (true = return, false = print)
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Timeout in seconds
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
// Download the given URL, and return output
$output = curl_exec($ch);
// Close the cURL resource, and free system resources
curl_close($ch);
return $output;
}
//add variables to the $post_data array
$post_data['ScriptManager1'] = "theContentPanel|btnReportQuery";
$post_data['TPDMenu_ClientState'] = "";
$post_data['hfJSChecked']="true";
$post_data['txtAccidentDate_text'] = "03/01/2011";
$post_data['txtAccidentDate'] = "2011-03-01-00-00-00";
$post_data['txtAccidentDateTo_text'] = "03/30/2011";
$post_data['txtAccidentDateTo'] = "2011-03-30-00-00-00";
$post_data['__EVENTTARGET'] = "btnReportQuery";
$post_data['__EVENTARGUMENT'] = "";
$post_data['__VIEWSTATE'] = $variables['0'];
$post_data['__EVENTVALIDATION'] = $variables['1'];
$post_data['txtAccidentDateTo_ClientState'] = "";
$post_data['txtAccidentLocation_text'] = "";
$post_data['txtAccidentLocation'] = "";
$post_data['txtAccidentLocation_ClientState'] = "";
$post_data['txtAccident_Number_text'] = "";
$post_data['txtAccidentNumber'] = "";
$post_data['txt_AccidentNumber_ClientState'] = "";
$post_data['dgSearchResults_ClientState'] = "";
$post_data['__ASYNCPOST'] = "true";
$post_data['RadAJAXControlID'] = "RadAjaxManager1";
//put all your POST data in to an array
foreach ($post_data as $key => $value) {
$post_items[] = $key . '=' . $value;
}
//implode the array in to a string
$post_string = implode ('&', $post_items);
//execute
print curl_download('http://72.240.45.198/Search.aspx', $post_string);