Hi all developers,
Most of our projects have scrap data related we use different ways commonly Curl / preg_match but all is depends on html source so if some one change the html structure change then code stop working.
Is there any perfect solution of this issue?
Thanks
scraping data
Moderator: General Moderators
Re: scraping data
Yes, and it's called Human 
Re: scraping data
Hand craft a resilient regex. Or use highly "exclusive" pattern matching instead of parsing the tags (example a phone number anywhere is a phone number, doesn't matter what kinds of html tags it is wrapped around).
Hint: Come up with example pages that illustrate possible scenarios that you are worried about breaking your code? ... Then test your code against those example pages until your software proves it is robust enough not to break on them anymore. Then test some more.
Hint: Come up with example pages that illustrate possible scenarios that you are worried about breaking your code? ... Then test your code against those example pages until your software proves it is robust enough not to break on them anymore. Then test some more.
-
qadeer_ahmad
- Forum Newbie
- Posts: 11
- Joined: Thu Aug 14, 2008 3:17 am
Re: scraping data
Yes this can be idea we follow the [man]DOM[/man] to navigate to a specific point.
We can create function for all tags separately and these function handle all possible condition. That can be a library to handle such things.
We can create function for all tags separately and these function handle all possible condition. That can be a library to handle such things.