Page 1 of 1

Scraping HTML, yuk

Posted: Mon Feb 08, 2010 4:36 pm
by $var
Howdy, I have been tasked with my first HTML scrape and am a bit foggy about how to do it.
In short, I need to take 1 of 3 tables out, and part of some content within a table.

Here is what I'm working with, I need to keep everything below the white 'Canada...' headline:
http://64.246.64.33/merge/tsnform.aspx? ... index.aspx

I tried and failed at using strpos to grab the table.

How would YOU do this?

As always, I am humbled by the wit and skill of DevNet.

Re: Scraping HTML, yuk

Posted: Mon Feb 08, 2010 8:59 pm
by John Cartwright
I would not, because it is against their Terms of Use.

A quick except of relevant terms usage:
You may not transmit or send messages, inquiries, scripts, "spiders," automated query programs, web crawlers, robotic programs, robots, or other similar devices to the Website or its associated server, or otherwise use or access, electronically or manually, this Website or its associated server, along or with others, in any manner which: (i) "scrapes," copies, collects, stores, transmits or reproduces any Materials or data displayed on the Website;

Re: Scraping HTML, yuk

Posted: Tue Feb 09, 2010 9:18 am
by $var
Hmm... well, I know we're partners and scrape other feeds. But I wouldn't want anyone to get in trouble for assisting me with this one.
Thanks anyway.

Re: Scraping HTML, yuk

Posted: Tue Feb 09, 2010 1:22 pm
by John Cartwright
$var wrote:Hmm... well, I know we're partners and scrape other feeds. But I wouldn't want anyone to get in trouble for assisting me with this one.
Thanks anyway.
If that is true.. why don't they give you an RSS feed or something designed for this kind of thing?

Re: Scraping HTML, yuk

Posted: Tue Feb 09, 2010 2:37 pm
by $var
My only guess is that the provider is very far behind in their technology and don't have simple feeds available for syndication of this type.
Problem solved anyhow, no scraping involved.