file_get_contents: & parsing - review
Posted: Sat May 21, 2011 3:27 pm
hello dear community _ good evening!
For the purpose of scraping this dataset with ++ 2700 records on foundation - in Switzerland
you see it here http://www.edi.admin.ch/esv/00475/00698 ... ml?lang=de
well to be frank - i am not sure - my console gives back some bad errors...
Can you help me please in this issue. love to hear from you
lin
btw: see a detailpage: http://www.edi.admin.ch/esv/00475/00698 ... sp?Id=3221
with the following information:
Name: "baiji.org" Foundation
Schlüsselwort: BAIJI
Adresse: Seefeldstr. 94
8008 Zürich
Mail: august@baiji.com
Zweck:
btw: see a translation;
Name: - > name
Schlüsselwort: - keyword
Adresse: - adress
Mail: - mail
Zweck: - purpose
For the purpose of scraping this dataset with ++ 2700 records on foundation - in Switzerland
you see it here http://www.edi.admin.ch/esv/00475/00698 ... ml?lang=de
Code: Select all
<?PHP // Original PHP code by Chirp Internet: http://www.chirp.com.au
// Please acknowledge use of this code by including this header.
$url = "http://www.edi.admin.ch/esv/00475/00698/index.html?lang=de";
$input = @file_get_contents($url) or die("Could not access file: $url");
$regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>";
if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) { foreach($matches as $match) {
// $match[2] = all the data i want to collect...
// $match[3] = text that i need to collect - see a detail-page
}
} ?>
Can you help me please in this issue. love to hear from you
lin
btw: see a detailpage: http://www.edi.admin.ch/esv/00475/00698 ... sp?Id=3221
with the following information:
Name: "baiji.org" Foundation
Schlüsselwort: BAIJI
Adresse: Seefeldstr. 94
8008 Zürich
Mail: august@baiji.com
Zweck:
btw: see a translation;
Name: - > name
Schlüsselwort: - keyword
Adresse: - adress
Mail: - mail
Zweck: - purpose