Hello,
I am trying to use CURL to scrape the data from a page that requires a user to log in. The problem is that CURL always seems to get the HTML from the login page even if I am logged in all ready by using the browser.
So if I am logged in the browser seems to store information somewhere because any time I go back to that link I am still logged on except when using CURL. If I use any of the buttons on this aspx page via the browser to perform a search for the data I need, the link does not change but the html displayed does. Then if I use CURL it will always go to the login page as if I never had logged on.
So is there any way to get the html that the browser is displaying based on choices made by the user for a .aspx file? If I just use CURL to go to that link it will always take me to the login screen as if the user has not done anything yet.
If there is another tool that would be better for scraping this type of data please let me know.
Curl with .aspx file login
Moderator: General Moderators
Re: Curl with .aspx file login
To simplify it all, I am having trouble getting CURL to work with the saved states of .aspx files through session variables or however the aspx files do it. Can anyone help me?
Re: Curl with .aspx file login
The browser identifies you to the site using a cookie which is stored in the browser. A cURL request would need to send the same information in the headers it sends to the server to be identified.
Re: Curl with .aspx file login
Ah ok, I assume by headers you mean what is placed between the <head></head> tags is being sent as cookies? If not, I am not sure how I am supposed to know what is being put into the cookies. Look for the files and open them?
Re: Curl with .aspx file login
Headers are not a part of the HTML document, it's a part of an http request sent before content to negotiate protocol settings with the client.
Check out this comment on the curl_setopt() page in the PHP manual - http://www.php.net/manual/en/function.c ... .php#87112
Check out this comment on the curl_setopt() page in the PHP manual - http://www.php.net/manual/en/function.c ... .php#87112
Re: Curl with .aspx file login
Ok, is there any way to know what is sent in the header or stored in the cookies without looking at the server code?
Re: Curl with .aspx file login
Not sure what you mean by that, you are the one who's supposed to send the right headers. Did you read the comment in the link I gave you?
Re: Curl with .aspx file login
I did, thanks.
curl_setopt( $curl_handle, CURLOPT_COOKIE, $strCookie );
That allows me to pass in a string that matches a cookie that is normally set in my browser right? So I have to find what that cookie is and can there be more than one? I'm not sure how to find what cookies are being set since I don't have the server code.
I was also looking at the comment above where someone sets a callback function for their header information. How does it get the second parameter, the string with the header data? I'm not sure I understand how this header part works.
curl_setopt( $curl_handle, CURLOPT_COOKIE, $strCookie );
That allows me to pass in a string that matches a cookie that is normally set in my browser right? So I have to find what that cookie is and can there be more than one? I'm not sure how to find what cookies are being set since I don't have the server code.
I was also looking at the comment above where someone sets a callback function for their header information. How does it get the second parameter, the string with the header data? I'm not sure I understand how this header part works.