XPath variable help - how to apply in DOM-Document !?
Posted: Sun May 29, 2011 8:00 am
hello dear community - good day
want to parse a site with the PHP DOM-Document way: Note it is faster and easier to use. Some of you have convinced me!! One question - since i am a php-newbie
can i apply the XPaths-code
Example: http://buergerstiftungen.de/cps/rde/.../hs.xsl/db.htm
Goal: to fetch the results ( approx 213 different records) too and parse them in order to get a database-dump for the saving on a local MySQL-Db!?
by the way: see two resultpages:
http://buergerstiftungen.de/cps/rde/...l/db_20302.htm http://buergerstiftungen.de/cps/rde/...l/db_20289.htm
You see there are lots of information stored...
well i have tried to do write a scraper with Perl - but i had no luck. Perl is for newbies very very hard. Afterwards i tired to write a parser in PHP - it is a bit easier. But the site (see the detail-resultpages) are a bit complex. How to parse them - in order to get the dataset for a locally based MySQL database. Then i have more opportunities for a retrieval. I want to get the datas to have them local (on my OpenSuse Linux System Version 11.3) in a MySQL-database.
well: i have three parts:
1. fetching
2. parsing
3. storing (in MySQL: that is creating a MySQL-dump)
Since i have some very little experience with XPath i have a Xpather-Tool in my Mozilla-Browser. But i am not sure how i should apply them - see the data i gathered - below: Perhaps some of you can help me here - and show me how to apply them in a parsercode:
I love to hear from you
See here some details: for the results (from the approx 213 different records) - see two resultpages: - gathered some Xpath-datas:
Example: Bürgerstiftung Wiesloch http://buergerstiftungen.de/cps/rde/...l/db_20289.htm
/html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='marginalblock']/div[1]/p
1. Gründungsgeschichte /html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='contentblock']/div/p[1]/strong
2. Kurzvorstellung/Ziele /html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='contentblock']/div/p[2]/span[2]/span/b
3. Projekte /html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='contentblock']/div/p[3]/span[2]/span/strong
Kontakt: /html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='marginalblock']/div[1]/h6
well how to apply them in the Libxml - in order to get the PARSER-Part up and running!?
want to parse a site with the PHP DOM-Document way: Note it is faster and easier to use. Some of you have convinced me!! One question - since i am a php-newbie
Example: http://buergerstiftungen.de/cps/rde/.../hs.xsl/db.htm
Goal: to fetch the results ( approx 213 different records) too and parse them in order to get a database-dump for the saving on a local MySQL-Db!?
by the way: see two resultpages:
http://buergerstiftungen.de/cps/rde/...l/db_20302.htm http://buergerstiftungen.de/cps/rde/...l/db_20289.htm
You see there are lots of information stored...
well i have tried to do write a scraper with Perl - but i had no luck. Perl is for newbies very very hard. Afterwards i tired to write a parser in PHP - it is a bit easier. But the site (see the detail-resultpages) are a bit complex. How to parse them - in order to get the dataset for a locally based MySQL database. Then i have more opportunities for a retrieval. I want to get the datas to have them local (on my OpenSuse Linux System Version 11.3) in a MySQL-database.
well: i have three parts:
1. fetching
2. parsing
3. storing (in MySQL: that is creating a MySQL-dump)
Since i have some very little experience with XPath i have a Xpather-Tool in my Mozilla-Browser. But i am not sure how i should apply them - see the data i gathered - below: Perhaps some of you can help me here - and show me how to apply them in a parsercode:
I love to hear from you
See here some details: for the results (from the approx 213 different records) - see two resultpages: - gathered some Xpath-datas:
Example: Bürgerstiftung Wiesloch http://buergerstiftungen.de/cps/rde/...l/db_20289.htm
/html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='marginalblock']/div[1]/p
1. Gründungsgeschichte /html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='contentblock']/div/p[1]/strong
2. Kurzvorstellung/Ziele /html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='contentblock']/div/p[2]/span[2]/span/b
3. Projekte /html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='contentblock']/div/p[3]/span[2]/span/strong
Kontakt: /html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='marginalblock']/div[1]/h6
well how to apply them in the Libxml - in order to get the PARSER-Part up and running!?