Page 1 of 1

Regular Expression Simple Help

Posted: Thu Sep 30, 2010 9:08 pm
by J0kerz
Hi guys,

I am trying to parse the results from a Google page.

After analysing the source code from the Google page, I found out that the URLs are located within this tag:
<li class=g><h3 class="r"><a href="URL HERE"

Code: Select all

	
//Google Search
	
		$ch = curl_init();
	     $user_agent='Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.0.19) Gecko/2010031422 Firefox/3.0.19';
        curl_setopt($ch, CURLOPT_URL, 'http://www.google.com/search?q=apple+pie&num=100&hl=en&lr=&as_qdr=all&prmd=ivn&ei=o8GbTKAFhJyWB6ul-csK&start=0&sa=N');
        curl_setopt($ch, CURLOPT_POST, 0);
        curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);						
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt($ch, CURLOPT_COOKIEJAR, "my_cookies.txt");
        curl_setopt($ch, CURLOPT_COOKIEFILE, "my_cookies.txt");
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
        $source = curl_exec($ch);


		//Extract result from Search

		preg_match_all('/<li class=g><h3 class="r"><a href="(.*)"/', $source , $result_array, PREG_SET_ORDER);

		
		// Show first 10 results
		for ($x = 0; $x < 10; $x){
		
			echo $result_array[$x][1].'<br>';
		

		} 

The thing is that it is extracting all the html page each time instead of only the URL. :banghead:

What is wrong in my regular expression?

Thanks guys!
:wink:

Re: Regular Expression Simple Help

Posted: Thu Sep 30, 2010 9:51 pm
by requinix
If you're doing a search then use the Google AJAX API instead.
(Don't let the "AJAX" fool you. You can do it easily in PHP too.)


On a separate note, don't PM people for help. If they want to help you, they will. If they don't, PMing them probably won't change their mind.
Personally, had I received your PM before replying, I wouldn't have posted until I waited a few hours.

Re: Regular Expression Simple Help

Posted: Thu Sep 30, 2010 9:55 pm
by J0kerz
Thanks for the info but would like not to use the Google AJAX API since I want to perform a simple search. Notthing more thant that.

What is wrong with my regular expression if I want to extract the URL? Why (.*) is not working...

Re: Regular Expression Simple Help

Posted: Thu Sep 30, 2010 10:04 pm
by requinix
Didn't even bother looking, huh?

Quickly dumbing-down something I have,

Code: Select all

$response = file_get_contents("http://ajax.googleapis.com/ajax/services/search/web?q=apple+pie&v=1.0");
print_r(json_decode($response, true));