how to test parsing data from remote webpage

Discussion of testing theory and practice, including methodologies (such as TDD, BDD, DDD, Agile, XP) and software - anything to do with testing goes here. (Formerly "The Testing Side of Development")

Moderator: General Moderators

Post Reply
User avatar
arjan.top
Forum Contributor
Posts: 305
Joined: Sun Oct 14, 2007 4:36 am
Location: Hoče, Slovenia

how to test parsing data from remote webpage

Post by arjan.top »

I do not have much experience with unit testing :oops:

So I have two classes:
1. get data (html) from url, url is static (can't be set)
2. parses data from html

So how can I test it?

Thanks for any help :wink:
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Re: how to test parsing data from remote webpage

Post by Chris Corbyn »

If it were me I'd:

a) Test parsing the content with a unit test case
b) Don't test downloading the content with a unit test, but write an acceptance test for it anyway

Acceptance tests are something I just include alongside my units. I like my units to be focused and not to rely on any external dependencies (code ones or service based ones). My acceptance tests are more like end-to-end tests which just fill in those grey areas the units can't quite reach. I do use SimpleTest to write my acceptance tests which probably isn't the most ideal tool but it works for me :)
User avatar
arjan.top
Forum Contributor
Posts: 305
Joined: Sun Oct 14, 2007 4:36 am
Location: Hoče, Slovenia

Re: how to test parsing data from remote webpage

Post by arjan.top »

Thanks for answer

a.) but how am I supposed to get html to parse?
b.) so acceptance test is just set of operations tested at once?
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Re: how to test parsing data from remote webpage

Post by Chris Corbyn »

arjan.top wrote:a.) but how am I supposed to get html to parse?
Write your own ;) It probably helps to read RFCs for stuff like this and test it the way it's documented in the RFC. But basically just set up your tests with some HTML in them:

Code: Select all

public function testHyperlinksAreParsed() {
  $html = 'This is HTML with <a href="http://foo.bar/path/?q=v">link one</a> and ' .
    '<a class="test" href="http://www.test.com/">link two</a> in it.';
  $parser = new HtmlParser();
  $parser->parse($html);
  //make assertions here
  $this->assertEqual(array('http://foo.bar/path/?q=v', 'http://www.test.com/'),
    $parser->getHyperlinks());
}
I'd be tempted to offer a Document interface you could mock:

Code: Select all

interface Document {
  public function getHtml();
}
Then you could make the worry about downloading content as small as possible.

Code: Select all

class RemoteDocument implements Document {
  private $_url;
  public function __construct($url) {
    $this->_url = $url;
  }
  public function getHtml() {
    return file_get_contents($this->_url);
  }
}
The advantage with that is that you can mock the interface and focus your parser on parsing HTML, not on downloading content. It;s a parser, not downloader right?

Using SimpleTest to mock that interface:

Code: Select all

Mock::generate('Document', 'MockDocument');
 
class HtmlParserTest extends UnitTestCase {
  public function testParsingHyperlinks() {
    $html = 'This is HTML with <a href="http://foo.bar/path/?q=v">link one</a> and ' .
      '<a class="test" href="http://www.test.com/">link two</a> in it.';
    $document = new MockDocument();
    $document->setReturnValue('getHtml', $html);
    $parser = new HtmlParser($document);
    //make assertions here
    $this->assertEqual(array('http://foo.bar/path/?q=v', 'http://www.test.com/'),
      $parser->getHyperlinks());
  }
}
I also write a little mock object library called Yay! Mock but it's very much in its infancy. You'd do the same as above like this:

Code: Select all

class HtmlParserTest extends UnitTestCase {
  public function testParsingHyperlinks() {
    $context = new Yay_Mockery();
    $html = 'This is HTML with <a href="http://foo.bar/path/?q=v">link one</a> and ' .
      '<a class="test" href="http://www.test.com/">link two</a> in it.';
    $document = $context->mock('Document');
    $context->checking(Yay_Expectations::create()
      -> atLeast(1)->of($document)->getHtml() -> returns($html)
      );
    $parser = new HtmlParser($document);
    //make assertions here
    $this->assertEqual(array('http://foo.bar/path/?q=v', 'http://www.test.com/'),
      $parser->getHyperlinks());
    $context->assertIsSatisfied();
  }
}
b.) so acceptance test is just set of operations tested at once?
Roughly speaking yes. I still try to focus them as much as possible, but I don't worry so much about external dependencies.
User avatar
arjan.top
Forum Contributor
Posts: 305
Joined: Sun Oct 14, 2007 4:36 am
Location: Hoče, Slovenia

Re: how to test parsing data from remote webpage

Post by arjan.top »

Thanks, very useful post :drunk:

I read a lot about mocks (mostly posts by you on this forum) but never knew the purpose of them really :)

I guess I will dive in mocking :P
User avatar
arjan.top
Forum Contributor
Posts: 305
Joined: Sun Oct 14, 2007 4:36 am
Location: Hoče, Slovenia

Re: how to test parsing data from remote webpage

Post by arjan.top »

Ok it works great now :D

But what if class is instantiated inside a method? Any way to "fake" return values of that object?
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Re: how to test parsing data from remote webpage

Post by Chris Corbyn »

arjan.top wrote:Ok it works great now :D

But what if class is instantiated inside a method? Any way to "fake" return values of that object?
That's what we call "tight coupling" and something which hinders unit testing. Using a registry can help since you can mock the registry to return a mock object. Other solutions are to pass in a factory object (which can be mocked) or to use dependency injection extensively. I admit, this is one of the places I let myself slide sometimes when I'm in a hurry to get something written.
User avatar
arjan.top
Forum Contributor
Posts: 305
Joined: Sun Oct 14, 2007 4:36 am
Location: Hoče, Slovenia

Re: how to test parsing data from remote webpage

Post by arjan.top »

Did it with dependency injection, thanks for help :wink:
Post Reply