hello dear folks, good evening dear community.
I need a starting-point! A German DB that collects all the data from all German Foundations...
see: http://www.suche.stiftungen.org/index.p ... baseID=129
Here we find all Foundations in Germany: : 8074 different foundations: You get the full results if you choose % as wildcard in the Search-field.
How to do this with PHP: i think that we have to do this with curl or with file_get_contents_ - those are the best methods for doing this: What do you think, personally. I am curious to get your ideas to know! please. lemme know what you think!?
BTW - probably - the XPATH and DOM-Technique can be used too. I guess so!?
on a sidenote: But if you do that - then you get some kind of overflow... 350 results are the limit. More is not possible to show. So the question is: How can we create a spider that runs across the site and asks step by step - that we get all : 8074 results.
The second question is: We get the following dataset:
Name: Allers'sche Tagelöhnerstiftung Landesstube des alten Landes Wursten
Street: Westerbüttel 13
Postal-code and town: 27632 Dorum
additional infos: Fördernd: Ja
additional infos: Operativ: Ja
webpage: http://www.sglandwursten.de
main area of work: Aufgabengebiete: Mildtätigkeit Kinder-/Jugendhilfe
regional-base: Regionale Einschränkungen: please 27632, 27637, 27638, 27607, Mitgliedsgemeinden im Bereich der Samtgemeinde Land Wursten, Nordholz, Imsum, verschiedene Gemeinden im Bereich der Samtgemeinde, Land Wursten, Gemeinde Nadholz
Target-group: Zielgruppen: Feste Destinatäre: Bewohner DRK-Alten- und Pflegeheim. Kinder, Jugendliche, Landarbeiter
All the dataset are simmilar! They seem to look exactly like this...
Th question is. Can this be stored directly into a MySQL-DB!?
Note; some descriptions are quite very very long. Guess that a Excel-Sheet can be overloaded by this!?
What do you think - is this doable!?
Love to hear from you - best regards
starting-point: parser that runs Curl and DOM (with Xpath)
Moderator: General Moderators
-
emelianenko
- Forum Commoner
- Posts: 35
- Joined: Thu Sep 09, 2010 11:49 am
Re: starting-point: parser that runs Curl and DOM (with Xpat
Do you discard using open source web crawlers such as dataparksearch or wget ? using wget I nearly downloaded the whole Amazon site until I decided to stop it, once I had download it, I wrote a perl script and that is it
as per file_get_contents, what are you going to do there ? all you can do is point it to a file and pass it to a variable, and as per xpath you need to convert it to xml, sorry if I sound a bit ignorant, but...am I correctly understanding that what you actually want is to grab all their contents from their database ? do you have access to the DB ?
Updated:
reading another posting, I found, if you use file_get_contents
you then use
and you would have to write a regular expression, but even so, it would be no match to using dataparksearch as I indicated above
Mit freundlichen Grüßen
as per file_get_contents, what are you going to do there ? all you can do is point it to a file and pass it to a variable, and as per xpath you need to convert it to xml, sorry if I sound a bit ignorant, but...am I correctly understanding that what you actually want is to grab all their contents from their database ? do you have access to the DB ?
Updated:
reading another posting, I found, if you use file_get_contents
Code: Select all
$input = @file_get_contents($url) or die("Could not access file: $url");
Code: Select all
int preg_match_all ( string $pattern , string $subject , array &$matches [, int $flags = PREG_PATTERN_ORDER [, int $offset = 0 ]] )
Mit freundlichen Grüßen
Re: starting-point: parser that runs Curl and DOM (with Xpat
good day dear emelinenko - [guten Morgen]
many thanks for the answer. (i can answer more in a longer message - but at the moment i am a bit short of time]
well i would love to make this parser-harvester-job in PHP (with Curl ) or - on the other hand side - i could do it with Perl-Mechanize...
Mechanize is a very very strong module - but the technique goes a bit over my head.
I answer later -.- have a great day
schoenen Sonntag !
regards lin
many thanks for the answer. (i can answer more in a longer message - but at the moment i am a bit short of time]
well i would love to make this parser-harvester-job in PHP (with Curl ) or - on the other hand side - i could do it with Perl-Mechanize...
Mechanize is a very very strong module - but the technique goes a bit over my head.
I answer later -.- have a great day
schoenen Sonntag !
regards lin