Page 1 of 1

file_get_contents: & parsing - review

Posted: Sat May 21, 2011 3:27 pm
by lin
hello dear community _ good evening!

For the purpose of scraping this dataset with ++ 2700 records on foundation - in Switzerland
you see it here http://www.edi.admin.ch/esv/00475/00698 ... ml?lang=de

Code: Select all

<?PHP // Original PHP code by Chirp Internet: http://www.chirp.com.au 
// Please acknowledge use of this code by including this header. 

$url = "http://www.edi.admin.ch/esv/00475/00698/index.html?lang=de"; 

$input = @file_get_contents($url) or die("Could not access file: $url"); 
$regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>"; 
if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) { foreach($matches as $match) { 

// $match[2] = all the data i want to collect... 
// $match[3] = text that i need to collect - see a detail-page

} 
} ?>

well to be frank - i am not sure - my console gives back some bad errors...

Can you help me please in this issue. love to hear from you

lin :)


btw: see a detailpage: http://www.edi.admin.ch/esv/00475/00698 ... sp?Id=3221



with the following information:
Name: "baiji.org" Foundation
Schlüsselwort: BAIJI
Adresse: Seefeldstr. 94
8008 Zürich
Mail: august@baiji.com
Zweck:


btw: see a translation;

Name: - > name
Schlüsselwort: - keyword
Adresse: - adress
Mail: - mail
Zweck: - purpose

Re: file_get_contents: & parsing - review

Posted: Tue May 24, 2011 1:35 pm
by Jade
Change this line: $input = @file_get_contents($url) or die("Could not access file: $url");

To this: $input = file_get_contents($url) or die("Could not access file: $url");

And post the error you're getting.

Re: file_get_contents: & parsing - review

Posted: Tue May 24, 2011 3:51 pm
by lin
hello Jade many many thanks for the help!

well i changed the code - but i do not know "what - i have did" - it does not give back any results!?

Code: Select all

<?PHP // Original PHP code by Chirp Internet: http://www.chirp.com.au
// Please acknowledge use of this code by including this header.

$url = "http://www.edi.admin.ch/esv/00475/00698/index.html?lang=de";

//$input = @file_get_contents($url) or die("Could not access file: $url");

$input = file_get_contents($url) or die("Could not access file: $url"); 

$regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>";
if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) { foreach($matches as $match) {

// $match[2] = all the data i want to collect...
// $match[3] = text that i need to collect - see a detail-page

}
} ?>
weill - it goes a bit over my head: "what - i have did" - it does not give back any results!?


Jade i look forwrad to hear from you!

regards

Re: file_get_contents: & parsing - review

Posted: Thu May 26, 2011 12:16 pm
by McInfo
lin wrote:my console gives back some bad errors...
Are they so bad you can't repeat them in public?

The code does not output any of the matches it might find. There are no echo or print statements. Is that the problem?

Re: file_get_contents: & parsing - review

Posted: Thu May 26, 2011 2:36 pm
by lin
Hello Mcinfo

many many thanks for the quick reply - great to hear from you
McInfo wrote:
lin wrote:my console gives back some bad errors...
Are they so bad you can't repeat them in public?

The code does not output any of the matches it might find. There are no echo or print statements. Is that the problem?
Yup you are right. McInfo - i want to parse all the datas:

btw: see a Link: http://www.edi.admin.ch/esv/00475/00698 ... ml?lang=de

see an example of an entry!

all the data should be collected - and stored in to a mysql-db.