Page 1 of 1

tailoring a regex: plz review the allreay nice running code

Posted: Sat Dec 11, 2010 4:26 pm
by lin
Good evening dear friends,


i am currently ironing out some parsers. Well - how can we apply this pice of code:

Code: Select all

<?php

$content = file_get_contents("http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=94468&lschb");

var_dump($content);

$pattern = '/<td>(.*?)<\/td>/si';
preg_match_all($pattern,$content,$matches);

foreach ($matches[1] as $match) {
    $match = strip_tags($match);
    $match = trim($match);
    var_dump($match);
}

<php

to this new Target-URL2:

http://www.schulministerium.nrw.de/BP/S ... pDO=154763

Note: if i have parser for the above mentioned sites - then i would be able to solve important
things... So i would love to get this issues solved.

Note the second example has some kind of invalid HTML...

By the way: because of the invalid HTML (in the above mentioned target-Url2)
I muses about a solution that makes usage of DOMDocument::getElementsByTagName

love to hear from you!

Re: tailoring a regex: plz review the allreay nice running c

Posted: Sat Dec 11, 2010 4:33 pm
by Jonah Bron
Are you just saying you want to scrape those pages?

Re: tailoring a regex: plz review the allreay nice running c

Posted: Sat Dec 11, 2010 6:08 pm
by lin
hi jonah Bron - good evening!

thx for answering.
Jonah Bron wrote:Are you just saying you want to scrape those pages?
Want to parse the data for a little educational project. I am a teacher - and i work in the field of
Nothing bad. Not to put the data back online...
i want to read the online data -/(wich is not very handy ) on a local spreadsheet.

Nothing harmeful... only a better way to retrieve and to read /and find the data.

i wonder why the regex fails on the second site. Guess that the HTML is way too bad & invalid.

any idea how to get this problem( with the invalid html) solved!?

look forward to hear from you

regards
lin

Update: and here some backgrounds - that may show you the objectives of doing the parsing:

What are my ideas and objectives: The idea behind the parsing-project is to get to know the schools that work in the same field as I do. To get to know the profile of the schools -/via their Websites and the things the school publish on their sites.
The long term goal is to get aware that we are not alone - but have also the know that there are some others work on the same field... and therefore have made similar experiences /have encounterd similar problems. And finally - it could
be great to get some idea-exchange on best-practice, talk about experiences and projects.

Background: i work as teacher at a school with disabled and handicapped people- We have different profiles that we work for. Our school profile includes from pre-school and k 12 needs. The most descriptive profile-focus is on pupils that are
    - socially disadvantaged
    - with deprived backgrounds
Therefore we have special educational needs - school (general education and learning section) we support ...
    * Disability support in education
    * disadvantaged
I want to get an overview on the different profieles in the larger field of education.

- i hope that clears up this a bit!

jonah - i wold be glad if you can lend a helping hand - and give me some hints - for a good starting point in PHP.
Thx in advance!