Unit testing HTML without oodles of regexps

Discussion of testing theory and practice, including methodologies (such as TDD, BDD, DDD, Agile, XP) and software - anything to do with testing goes here. (Formerly "The Testing Side of Development")

Moderator: General Moderators

Post Reply
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Unit testing HTML without oodles of regexps

Post by Ambush Commander »

So I have an HTMLPrinter that I'd like to unit test, but the problem is that the output it gives is tremendously complex and any character for character equality test would be an extremely fragile testcase. Of course, you can try regexps, but that doesn't seem... well... precise enough. Is there any alternative, or am I stuck with, to paraphrase, "oodles of regexps"?
User avatar
dbevfat
Forum Contributor
Posts: 126
Joined: Tue Jun 28, 2005 2:47 pm
Location: Ljubljana, Slovenia

Post by dbevfat »

What exactly does your HTMLPrinter do and what are you trying to test?
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

It, as the name suggests, takes a story object and returns HTML string that would be put on the main page. Quite a bit of presentation logic going on in it.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

Did you ever find an alternative to "oodles of regexps"? Could you give an example so I can see more clearly how you are trying to test the output of your HTMLPrinter?
(#10850)
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Nope, never thought of a solution.

However, it occurs to me that the biggest problem is diregarding non-meaningful whitespace. You could run a few "trimming" functions to make the two "essentially" the same, and then compare them regularly. In a related manner, you do some very minor processing to turn the expectancy into a regexp as painlessly as possible. I don't see a way out though.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

I don't think unit testers are the best way to validate the HTML. Perhaps you could run the HTML output of the Printer class through a HTML validator and then use the unit tester to examine the output of the validator.
(#10850)
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Rare are Printer outputs valid HTML documents. ;-) Did you mean HTML parser?
JPlush76
Forum Regular
Posts: 819
Joined: Thu Aug 01, 2002 5:42 pm
Location: Los Angeles, CA
Contact:

Post by JPlush76 »

personally I don't see the need to actually validate the html you're outputting with unit tests because as you say its very fragile. The goal of a unit test is to make sure the code is functioning properly with proper and improper input.

So with that said what I would do is make sure that my html printer object validates any parameters passed to it, responds properly to bad parameters, also mock it out to test that the methods are being called the appropriate number of times. I would save the actual data validation for a web acceptance test using simpletests webtester functionality.

Having not really seen your code if you have things like function addTitle() or function addFooter
those things that are constent should be able to be validated.

good luck
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

Ambush Commander wrote:Rare are Printer outputs valid HTML documents. ;-) Did you mean HTML parser?
I don't know if I said it well, but my point was that there may be other programs that would be better to analyze your complex output. If the HTML lexer for the web tester you are using is not up to the job, then use a more sophisticate tool and check its output. Like JPlush76 I would need to see what the code and output looks like.
(#10850)
Post Reply