Page 1 of 1

file_get_contents or Curl - which one to take?

Posted: Tue May 24, 2011 2:55 pm
by lin
i currently write a little parser & harvester that collects the data of this website: (see below)

http://www.aktive-buergerschaft.de/buer ... ungsfinder

i want to have all foundations that are listed on this page (see examples below).- Well i think, that i
need to choose between file_get_contents and curl - to fetch the datas.
And i have tu use some ideas of a parser - i do not know which one i should use here. Can you give me some hints!?

first .- i present my FETCHING-Part: with curl:

well I've never needed to use curl myself, but, obvious resource php.net's example is;

Code: Select all

<?php
// create a new cURL resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);

// grab URL and pass it to the browser
$data = curl_exec($ch);

// close cURL resource, and free up system resources
curl_close($ch);


//Then you can use $data for parsing
?>

well to be frank: If we dont have curl a slower function is file_get_contents() - this will work too! Well i think that it just is about 1-2 seconds slower, but the call is much easier!

Code: Select all

<?php
$html = file_get_contents('http://www.example.com');

//now all the html is the $html
?>
anyway - i think the much more interesting part is the parsing


i have to parse the stuff - in order to get the following data: See the site with examples..http://www.aktive-buergerschaft.de/buer ... ungsfinder

Bürgerstiftung Lebensraum Aachen
rechtsfähige Stiftung des bürgerlichen Rechts
Ansprechpartner: Hubert Schramm
Alexanderstr. 69/ 71
52062 Aachen
Telefon: 0241 - 4500130
Telefax: 0241 - 4500131
Email: info@buergerstiftung-aachen.de
http://www.buergerstiftung-aachen.de
>> Weitere Details zu dieser Stiftung

Bürgerstiftung Achim
rechtsfähige Stiftung des bürgerlichen Rechts
Ansprechpartner: Helga Kühn
Rotkehlchenstr. 72
28832 Achim
Telefon: 04202-84981
Telefax: 04202-955210
Email: info@buergerstiftung-achim.de
http://www.buergerstiftung-achim.de
>> Weitere Details zu dieser Stiftung

BürgerStiftung Region Ahrensburg
rechtsfähige Stiftung des bürgerlichen Rechts
Ansprechpartner: Dr. Michael Eckstein
An der Reitbahn 3
22926 Ahrensburg
Telefon: 04102 - 67 84 89
Telefax: 04102 - 82 34 56
Email: info@buergerstiftung-ahrensburg.de
http://www.buergerstiftung-region-ahrensburg.de
>> Weitere Details zu dieser Stiftung
i have to parse the stuff - in order to get the following data: See the site with examples..http://www.aktive-buergerschaft.de/buer ... ungsfinder

Note: see the link here - >> Weitere Details zu dieser Stiftung i need to grab the datas that is "behind" this link!

look forward to hear from you

Re: file_get_contents or Curl - which one to take?

Posted: Sat Jun 11, 2011 2:30 pm
by getmizanur
> file_get_contents or Curl - which one to take?
use curl, if you got it installed

Re: file_get_contents or Curl - which one to take?

Posted: Sun Jun 12, 2011 6:29 am
by lin
hi there - hello dear friend,

can you help me with the Curl-approach. this would be veryvery great!!

look forward to hear from you

lin

Re: file_get_contents or Curl - which one to take?

Posted: Sun Jun 12, 2011 12:45 pm
by getmizanur
You are heading in the right direction with your code snippet. i have added one extra line. use this and then filter the data using regular expression. give it a go, if you run into trouble with regular expression you know where to come.

Code: Select all

<?php
// create a new cURL resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // new line

// grab URL and pass it to the browser
$data = curl_exec($ch);

// close cURL resource, and free up system resources
curl_close($ch);

Re: file_get_contents or Curl - which one to take?

Posted: Mon Jun 13, 2011 4:53 pm
by Eric!
FYI -- cURL will be faster (by 10% or more) if you're doing a lot of data fetching.