Screen scraping sportsbook football prices

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
sirdavethebrave
Forum Newbie
Posts: 2
Joined: Tue Feb 01, 2011 9:47 am

Screen scraping sportsbook football prices

Post by sirdavethebrave »

Hi guys -
Just found this forum today and it looks the perfect place to ask for some help!

I've been working on a recreational app that will allow me to log into Pinnacle Sportsbook and download the English Premier League odds. I'd use the publicly available xml feed but it doesn't contain all their lines, so I'm resorting to scraping. I've kind of been following a tutorial for this, so I don't know if I've got this exactly right and I suppose if I'm honest, I don't know exactly what is happening in every area of my page. I've tried some reading and so forth, but I think my problem is probably quite specific.

Anyhow - I'm using CURL to try to log in to a website and after that website allows me in, I want to go to a page where I know the lines for a premier league game exist. I think basically I've got a problem after I log in and try to access a page. To be honest I'm not sure exactly where this is falling over. I think maybe I'm losing my login credential somewhere along the line.

The output I'm getting to my browser is:

Object Moved
This object may be found here.


Here is my php file I'm using to connect (with username and password obscured)

PHP Code:

Code: Select all

function curl_login($url,$data,$proxy,$proxystatus){

    $fp = fopen("cookie.txt", "w");

    fclose($fp);

    $login = curl_init();

    curl_setopt($login, CURLOPT_POSTFIELDS, $data);

    curl_setopt($login, CURLOPT_COOKIEJAR, "cookie.txt");

    curl_setopt($login, CURLOPT_COOKIEFILE, "cookie.txt");

    curl_setopt($login, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6");

    curl_setopt($login, CURLOPT_TIMEOUT, 40);

    curl_setopt($login, CURLOPT_RETURNTRANSFER, TRUE);

    if ($proxystatus == 'on') {

        curl_setopt($login, CURLOPT_SSL_VERIFYHOST, FALSE);

        curl_setopt($login, CURLOPT_HTTPPROXYTUNNEL, TRUE);

        curl_setopt($login, CURLOPT_PROXY, $proxy);

    }

    curl_setopt($login, CURLOPT_URL, $url);

    curl_setopt($login, CURLOPT_HEADER, TRUE);

    curl_setopt($login, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);

    curl_setopt($login, CURLOPT_FOLLOWLOCATION, TRUE);

    curl_setopt($login, CURLOPT_POST, TRUE);

   

    ob_start();      // prevent any output

    return curl_exec ($login); // execute the curl command

    ob_end_clean();  // stop preventing output

    curl_close ($login);

    unset($login);    

}               



function curl_grab_page($site,$proxy,$proxystatus){

    $ch = curl_init();

    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);

    if ($proxystatus == 'on') {

        curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);

        curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);

        curl_setopt($ch, CURLOPT_PROXY, $proxy);

    }

    curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");

    curl_setopt($ch, CURLOPT_URL, $site);

    ob_start();      // prevent any output

    return curl_exec ($ch); // execute the curl command

    ob_end_clean();  // stop preventing output

    curl_close ($ch);

}



$loginurl = 'https://www.pinnaclesports.com/Secure/LoginPage.aspx?destination=sports';

$data = '__EVENTTARGET=ctl00%24LF%24LB&__EVENTARGUMENT=&__LASTFOCUS=&__VIEWSTATE=%2FwEPDwULLTE1MTczNzI3NjUPZBYCZg9kFgQCAQ9kFgQCBA8WAh4EVGV4dAVUDQo8bWV0YSBuYW1lPSJ2ZXJpZnktdjEiIGNvbnRlbnQ9InJvT3JjOUljZHlZVFgrVDZnWm1PN2VQejMwRWNLR2VXTkJqbmwzWnlaQUk9IiAvPg0KZAIKD2QWAmYPFgIfAAWJATxzdHlsZSB0eXBlPSJ0ZXh0L2NzcyI%2BKi5pIHtiYWNrZ3JvdW5kLWltYWdlOnVybChodHRwczovL2NvbnRlbnQucGlubmFjbGVzcG9ydHMuY29tL3VwbG9hZGVkSW1hZ2VzL0d1ZXN0U2VjdGlvbi9jb21iaW5lZDIzLnBuZyk7fTwvc3R5bGU%2BZAIDDxYEHgZvbmxvYWQFMmphdmFzY3JpcHQ6UDdfaW5pdFBNKDEsOCwwLC0yMCwyKTtQN19pbml0VFAoNCwgMCk7HghvbnVubG9hZAUWamF2YXNjcmlwdDpvblVubG9hZCgpOxYEAgEPZBYEAgMPZBYCAgEPZBYGZg8QZGQWAWZkAgEPEGRkFgECAWQCAg8QDxYCHgdWaXNpYmxlaGRkFgECAmQCBw9kFgQCBQ8WAh8DaGQCBw8WAh8DaGQCAw9kFgJmDxYEHgRNb2RlCyolU3lzdGVtLldlYi5VSS5XZWJDb250cm9scy5MaXRlcmFsTW9kZQEfAAWXFTwhLS0gU3RhcnQgb2YgSW50ZWxsaVRyYWNrZXIgUGFnZSBUYWcgLS0%2BDQo8c2NyaXB0IHR5cGU9InRleHQvamF2YXNjcmlwdCI%2BPCEtLSANCnZhciBwcXJ5PSJMYW5ndWFnZUlEJTNEMCUyNm9kZHNGb3JtYXQlM0RkZWNpbWFsIjsNCnZhciBycXJ5PSJpUkVHUXJ5IjsNCnZhciBzcXJ5PSJpU2FsZSI7DQp2YXIgaXRyTUlkID0gODI1Ow0KdmFyIGl0clJxc3RIID0gInRyYWNrZXIucGlubmFjbGVzcG9ydHMuY29tIjsNCnZhciBkdD13aW5kb3cuZG9jdW1lbnQsbnI9bmF2aWdhdG9yLGluYT1uci5hcHBOYW1lLHNyPSIwJjAiLHB4PTAsc3Y9MTMsamU9MDsNCnZhcglpbmF2PW5yLmFwcFZlcnNpb24saWllPWluYXYuaW5kZXhPZignTVNJRSAnKSxpbnRwPShpbmEuaW5kZXhPZignTmV0c2NhcGUnKT49MCk7DQppZihpaWU%2BMClpbmF2aT1wYXJzZUludChpbmF2LnN1YnN0cmluZyhpaWUrNSkpO2Vsc2UgaW5hdmk9cGFyc2VGbG9hdChpbmF2KTsNCmZ1bmN0aW9uIGlycyhzLGYscil7dmFyIHA9cy5pbmRleE9mKGYpO3doaWxlKHA%2BPTApe3M9cy5zdWJzdHJpbmcoMCxwKStyK3Muc3Vic3RyaW5nKHArZi5sZW5ndGgscy5sZW5ndGgpO3A9cy5pbmRleE9mKGYpfXJldHVybiBzfQ0KZnVuY3Rpb24gY2VzYyhzKXtpZihzLmxlbmd0aD4wKSByZXR1cm4gaXJzKGlycyhpcnMoaXJzKGlycyhzLCcrJywnJTJCJyksJy4nLCclMkUnKSwnLycsJyUyRicpLCc9JywnJTNEJyksJyYnLCclMjYnKSA7IGVsc2UgcmV0dXJuIHM7fQ0KZnVuY3Rpb24gaWVzYyhzKXtyZXR1cm4gY2VzYyhlc2NhcGUocykpO30gDQpmdW5jdGlvbiBncHIoKXsNCnZhciBwcj0nJywgaXB3PXdpbmRvdywgaXByPSd3aW5kb3cnLCBpd0w9JycsIGlwTD0nJzsNCndoaWxlIChpcEw9PWl3TCl7DQppdz1pcHc7IHByPWl3LmRvY3VtZW50LnJlZmVycmVyOw0KaWYoaW50cCkgYnJlYWs7aWYoKCcnK2l3LnBhcmVudC5sb2NhdGlvbik9PScnKWJyZWFrOw0KaXdMPShpdy5kb2N1bWVudC5sb2NhdGlvbi5wcm90b2NvbCsnXC9cLycraXcuZG9jdW1lbnQubG9jYXRpb24uaG9zdG5hbWUpLnRvTG93ZXJDYXNlKCk7DQppcEw9cHIuc3Vic3RyaW5nKDAsaXdMLmxlbmd0aCkudG9Mb3dlckNhc2UoKTsNCmlwcj1pcHIrJy5wYXJlbnQnOyBpcHc9ZXZhbChpcHIpOyBpZiAoaXc9PWlwdykgYnJlYWs7fXJldHVybiBwcjt9DQpmdW5jdGlvbiBpdHJjKCl7dmFyIG53PW5ldyBEYXRlKCksY2U9MixpdWw9Jyc7DQppZiAoZHQuY29va2llKSBjZT0xOw0KZWxzZSB7dmFyIGV4PW5ldyBEYXRlKG53LmdldFRpbWUoKSsxMDAwKTsgZHQuY29va2llPSJpdGM9MzsgRVhQSVJFUz0iK2V4LnRvR01UU3RyaW5nKCkrIjsgcGF0aD0vIjtpZiAoZHQuY29va2llKSBjZT0xO30JCQ0KaWYoaW5hdmk%2BPTQpIGl1bD1pZXNjKGlpZT4wJiZuci51c2VyTGFuZ3VhZ2U%2FbnIudXNlckxhbmd1YWdlOm5yLmxhbmd1YWdlKTsNCnZhciB1bj1NYXRoLnJvdW5kKE1hdGgucmFuZG9tKCkqMjEwMDAwMDAwMCk7DQppbD1pc2wrdW4rIiYiK2llc2MoZ3ByKCkpKyIlMjAmIitjZXNjKHBxcnkpKyIlMjAmIitjZXNjKHJxcnkpKyIlMjAmIg0KK2Nlc2Moc3FyeSkrIiUyMCYiK2NlKyImIitzcisiJiIrcHgrIiYiK2plKyImIitzdisiJiIraXVsKyIlMjAmIitudy5nZXRUaW1lem9uZU9mZnNldCgpKyImIitpZXNjKGlkbCkrIiUyMCI7DQppZihpaWU%2BMCAmJiBpbC5sZW5ndGg%2BMjA0NSlpbD1pbC5zdWJzdHJpbmcoMCwyMDQ1KTsNCnZhciBpaW49J2l0cjgyNScsIGl3cmk9dHJ1ZTsNCmlmKGR0LmltYWdlcyl7aWYoIWR0LmltYWdlc1tpaW5dKWR0LndyaXRlKCc8ZGl2IHN0eWxlPSJkaXNwbGF5Om5vbmUiPjxpJysnbWcgbmFtZT0iJytpaW4rJyIgaGVpZ2h0PSIxIiB3aWR0aD0iMSIgYWx0PSJJbnRlbGxpVHJhY2tlciIvPjwvZGl2PicpOw0KaWYoZHQuaW1hZ2VzW2lpbl0pe2R0LmltYWdlc1tpaW5dLnNyYz1pbCsnJjAnO2l3cmk9ZmFsc2U7fX0NCmlmKGl3cmkpZHQud3JpdGUoJzxpJysnbWcgc3InKydjPSInK2lsKycmMCIgaGVpZ2h0PSIxIiB3aWR0aD0iMSI%2BJyk7fQ0KdmFyIGlkbD13aW5kb3cubG9jYXRpb24uaHJlZjt2YXIgaXNsPSJodHRwIisoaWRsLmluZGV4T2YoJ2h0dHBzOicpPT0wPydzJzonJykrIjovL3RyYWNrZXIucGlubmFjbGVzcG9ydHMuY29tL2UvdDMuZGxsPzgyNSYiOw0KaXRyYygpOw0KLy8tLT48L3NjcmlwdD4NCjxzY3JpcHQgdHlwZT0idGV4dC9qYXZhc2NyaXB0Ij48IS0tDQppZihpaWU%2BMClkdC53cml0ZSgiXDxcIVwtXC0iKTsNCi8vLS0%2BPC9zY3JpcHQ%2BDQo8c2NyaXB0IHR5cGU9InRleHQvamF2YXNjcmlwdCIgc3JjPSJodHRwczovL3RyYWNrZXIucGlubmFjbGVzcG9ydHMuY29tL2UvY2xpY2tzLmpzIj48L3NjcmlwdD4NCjxub3NjcmlwdD4NCjxkaXYgc3R5bGU9ImRpc3BsYXk6bm9uZSI%2BPGltZyBzcmM9J2h0dHBzOi8vdHJhY2tlci5waW5uYWNsZXNwb3J0cy5jb20vZS90My5kbGw%2FODI1JmFtcDswJmFtcDslMjAmYW1wO0xhbmd1YWdlSUQlM0QwJTI2b2Rkc0Zvcm1hdCUzRGRlY2ltYWwmYW1wO2lSRUdRcnkmYW1wO2lTYWxlJmFtcDswJmFtcDswJmFtcDswJmFtcDswJmFtcDswJmFtcDswJmFtcDslMjAmYW1wOzE1MDAmYW1wOyUyMCZhbXA7MCcgaGVpZ2h0PSIxIiB3aWR0aD0iMSIgYWx0PSJJbnRlbGxpVHJhY2tlciIvPjwvZGl2Pg0KPC9ub3NjcmlwdD48IS0tLy8tLT4NCjwhLS0gRW5kIG9mIEludGVsbGlUcmFja2VyIFBhZ2UgVGFnIC0tPg0KZGQ%3D&__PREVIOUSPAGE=cC6v86rLgpzKauPW5R75VZ6MnouJHAk-4dX5bAgGIAb5u7LxzDThRsxab3Nvnz3jfDBRuieaF3sFEN_3HIHC-n835N01&ctl00%24LDDL=1&ctl00%24PSDDL=decimal&ctl00%24MCPH%24LF%24UserName=***&ctl00%24MCPH%24LF%24Password=***&ctl00%24MCPH%24LF%24LanguageID=0&ctl00%24MCPH%24LF%24PriceStyle=decimal&ctl00%24MCPH%24LF%24LinesTypeView=c&ctl00%24MCPH%24LF%24MemberServer=www39.pinnaclesports.com';



echo curl_login($loginurl,$data,'','off');

echo curl_grab_page('http://www39.pinnaclesports.com/Members/gameselection.asp?sportType=Soccer&sportSubType=Eng.+Premier&descr=1','','off'); 

I followed the information here: http://www.youtube.com/watch?v=XcgQUsorF_8 and the advice to use liveHeaders in Firefox to get the post data.

Any help with this kind of this thing or some pointers would be greatly appreciated.

Thanks.
sirdavethebrave
Forum Newbie
Posts: 2
Joined: Tue Feb 01, 2011 9:47 am

Re: Screen scraping sportsbook football prices

Post by sirdavethebrave »

Ok - so I realised I was going from https:// to http://

I corrected that and checked that the destination grab_page worked in the browser as https://

It does and now when I put in the destination as https// it displays a blank page. No longer the object not found message. Just nothing at all. Does this sound a familiar problem to anyone?

Thanks again.
Post Reply