Howdy, I have been tasked with my first HTML scrape and am a bit foggy about how to do it.
In short, I need to take 1 of 3 tables out, and part of some content within a table.
Here is what I'm working with, I need to keep everything below the white 'Canada...' headline:
http://64.246.64.33/merge/tsnform.aspx? ... index.aspx
I tried and failed at using strpos to grab the table.
How would YOU do this?
As always, I am humbled by the wit and skill of DevNet.
Scraping HTML, yuk
Moderator: General Moderators
- John Cartwright
- Site Admin
- Posts: 11470
- Joined: Tue Dec 23, 2003 2:10 am
- Location: Toronto
- Contact:
Re: Scraping HTML, yuk
I would not, because it is against their Terms of Use.
A quick except of relevant terms usage:
A quick except of relevant terms usage:
You may not transmit or send messages, inquiries, scripts, "spiders," automated query programs, web crawlers, robotic programs, robots, or other similar devices to the Website or its associated server, or otherwise use or access, electronically or manually, this Website or its associated server, along or with others, in any manner which: (i) "scrapes," copies, collects, stores, transmits or reproduces any Materials or data displayed on the Website;
Re: Scraping HTML, yuk
Hmm... well, I know we're partners and scrape other feeds. But I wouldn't want anyone to get in trouble for assisting me with this one.
Thanks anyway.
Thanks anyway.
- John Cartwright
- Site Admin
- Posts: 11470
- Joined: Tue Dec 23, 2003 2:10 am
- Location: Toronto
- Contact:
Re: Scraping HTML, yuk
If that is true.. why don't they give you an RSS feed or something designed for this kind of thing?$var wrote:Hmm... well, I know we're partners and scrape other feeds. But I wouldn't want anyone to get in trouble for assisting me with this one.
Thanks anyway.
Re: Scraping HTML, yuk
My only guess is that the provider is very far behind in their technology and don't have simple feeds available for syndication of this type.
Problem solved anyhow, no scraping involved.
Problem solved anyhow, no scraping involved.