Page 1 of 1
scraping data
Posted: Wed Feb 17, 2010 7:01 am
by qadeer_ahmad
Hi all developers,
Most of our projects have scrap data related we use different ways commonly Curl / preg_match but all is depends on html source so if some one change the html structure change then code stop working.
Is there any perfect solution of this issue?
Thanks
Re: scraping data
Posted: Wed Feb 17, 2010 7:01 am
by Darhazer
Yes, and it's called Human

Re: scraping data
Posted: Wed Feb 17, 2010 9:15 am
by qadeer_ahmad
Re: scraping data
Posted: Wed Feb 17, 2010 3:34 pm
by josh
Hand craft a resilient regex. Or use highly "exclusive" pattern matching instead of parsing the tags (example a phone number anywhere is a phone number, doesn't matter what kinds of html tags it is wrapped around).
Hint: Come up with example pages that illustrate possible scenarios that you are worried about breaking your code? ... Then test your code against those example pages until your software proves it is robust enough not to break on them anymore. Then test some more.
Re: scraping data
Posted: Thu Feb 18, 2010 8:09 am
by qadeer_ahmad
Yes this can be idea we follow the [man]DOM[/man] to navigate to a specific point.
We can create function for all tags separately and these function handle all possible condition. That can be a library to handle such things.