I don't claim to be an experienced tester but for what it's worth here's my thoughts, for better or worse.
Neither am I

. Thanks for such a comprehensive comment!
Excellent! I must admit I don't always follow TDD perfectly (write failing test then write some code) but it is a very nice way to work.
This is the first time I'm actually using it, but I've read about it and it does sound very cool.
Test-first helps to stop you linking the tests too closely to the implementation rather than, as they should be, minimal constraints on all possible solutions. If you've just written a class your mind can be too full of the solution you chose and you start writing tests for that. I find that's an easy trap to fall into. It's not all bad: this will at least verify the code you just wrote but, crucially, what they don't do is provide a proper critique for later refactorings. Another solution could be valid but still fail tests which are implementation-specific.
See, the funny thing is I did have an HTML parser that I used (and was stack-based), but it was really simplistic and didn't renest tags, so I decided I wanted to wrap the logic in a class rather in just a function. So I do have a little idea about how I want to implement it. Hopefully my tests are NPOV.
Interface - how about:
Simpler the better.
That probably would work. Good idea.
If unclosed tags are automatically closed, perhaps testSanitise should assert that the unclosed <html> tag is closed? In general I'd be inclined just to discard any unclosed tags since you don't really know where the author of the text meant to close them - except for tags like html and body. A new test: testDiscardUnclosedTagsExceptHtmlAndBody?
Actually, it was to eliminate a tag that I didn't want to keep. Hmm... I should expand that test. I white-list certain tags and allow them to be used.
If you stick with closing all unclosed tags, will you need: testWithMultipleUnClosedTags?
"testNesting" asserts that overlapping tags <b><i></b></i> will be resolved into <b><i></i></b>. Perhaps they should all just be discarded? You wouldn't want to <span><div></div></span>. Instead of discarding, you could get clever with some rules to resolve these issues on a tag-by-tag basis but that will add a lot of complexity under the bonnet - although not for the user.
Actually, that was part of the whole idea. You can't nest block-level elements in P tags, TABLE tags can't contain B tags unless they're inside a TR and TD tag. I've been studying the HTML 4.1 specification recently and the bulk of the programming, I think, is the planned checkContext($tag,$tag_tree) which makes sure a tag that is being opened is allowed in that context (the same can be said for text).
Tidy, from what I've seen, tries to be smart about misnested tags. I don't want this to be smart, so I'm just going to have it give a high priority to opening tags, and then kill off or add closing tags as necessary.
I noticed you've got set up options (multiline etc) planned for the class. Instantiating the object (maybe give it a more explanatory name like "validator" or whatever rather than $this->object) in setUp means that these are fixed for all tests rather than being able to explore different options in different test methods. Maybe that's what you want.
Never thought of it that way: I was just using the options to narrow down the functioning of it while the other options could be easily extended... yeah I'm going to have to document and test those.
TWP_Parser_Html doesn't immediately tell me what the class does. It's hard to get good names but very important. A thesaurus can be handy. I wouldn't claim that I always succeed but I spend a lot of time trying to find good names. I'll tolerate unwieldy length much more than obscurity.
What can I say? That's a very good idea.
You probably don't need to assert type when comparing strings unless you really need to differentiate between, say, a null and an empty string. At the same time I think you do have to pay a bit more attention to type than normal in tests. In the testing framework I use (SimpleTest) assertTrue doesn't compare type - there's an assertIdentical for that. There are other type-insensitive methods. I'm assuming phpUnit is similar - it may not be. Type doesn't make a difference here (one half of the equation is always a string) but generally you don't want to restrict the solution any more than you have to and therefore shouldn't assert type unless you really need to. Another example (again in SimpleTest) if you assertEqual two hashes, the key order doesn't matter. With assertIdentical, if two arrays have the same key=>value pairs but in different orders, the comparison returns false. Numerical arrays always have to have the same order for SimpleTest assertEqual to return true (I hacked together an assertIdenticalArrayValues method for that).
Err... assertTrue simply asserts that the expression is true (no comparison). But I think I get the jist of what you're saying.
SimpleTest has a webtester which can enter values in form fields and submit the form. With this, you might be able to submit a few carefully chosen examples to a web xhtml validator etc, and check if your tidied html validates, if that would be useful. I can't remember if you can submit strings for validation as well as urls. At the least, with finished pages, you could take the grunge out of validating an entire website each time you change the design. Out of courtesy, you wouldn't want to be clogging up their bandwidth with thousands of tests every hour, which would be quite possible to do with automated tests.
XHTML validation. Good idea. (you know, I've always wondered why they never released the validator as open code that could be operated locally).
One final ad for SimpleTest: afaik it's the only testing framework with mock objects. This allows you to use testing more as a design tool. Applications can be written top down, mocking out neighbouring objects as you go. The mocks define interfaces which you can come back and fill in later.
I'll take a look at SimpleTest. I've visited
http://www.lastcraft.com before, but when it came time to find a harness I defaulted to PEAR.
I'll try to think about everything you've said and incorporate it into my tests. It's become apparent to me that my tests are way to vague

Thanks for the response!