Page 1 of 1

curl—does it work the way I think it does?

Posted: Wed Aug 21, 2013 9:15 am
by someguyhere
If I load a page with curl, would it appear the same as if a browser opened it? (Assuming I set a user agent.)

If the answer to that is yes, let's say I load a page with curl, and then extract and load another page from a particular link on that page—would that appear (in server logs or web analytics) the same as if a user did that in their browser? I am trying to run an experiment on something and require the ability to do that.

Re: curl—does it work the way I think it does?

Posted: Wed Aug 21, 2013 1:08 pm
by requinix
someguyhere wrote:If I load a page with curl, would it appear the same as if a browser opened it? (Assuming I set a user agent.)
If you mimicked all the headers (there's many more than just the user agent) then yes, you will get the same response content your browser would get. Remember that cURL won't automatically retrieve images or CSS, nor will it execute Javascript.
someguyhere wrote:If the answer to that is yes, let's say I load a page with curl, and then extract and load another page from a particular link on that page—would that appear (in server logs or web analytics) the same as if a user did that in their browser?
You also have to care about cookies, but basically yes.
someguyhere wrote:I am trying to run an experiment on something and require the ability to do that.
What kind of experiment? There might be an easier way of doing whatever you need to do.

Re: curl—does it work the way I think it does?

Posted: Wed Aug 21, 2013 1:15 pm
by someguyhere
Short version: I want to test the effects of a certain type of query on Google's behavior.

Re: curl—does it work the way I think it does?

Posted: Wed Aug 21, 2013 1:20 pm
by someguyhere
If this is the right approach, other than CURLOPT_HEADER, are there any other options I need to add?

Re: curl—does it work the way I think it does?

Posted: Wed Aug 21, 2013 2:54 pm
by requinix
There's the vanilla search (you'd have to parse the HTML for the results) and Google Instant (looks like a nightmare to reverse engineer the AJAX variables), but I think CSE might be the best option. Assuming you can set up one to just do a normal search without restricting to some criteria, last I knew the API was very easy to use to run queries against it. But you'd want to compare its results to what you get from a completely anonymous search (remember the results are personalized) to make sure the results would be valid.