Page 2 of 3

Re: Testing against production data

Posted: Fri Oct 03, 2008 11:30 am
by Eran
For keeping your test database updated, use schema deltas instead of dumping the entire database each time. Something like dbdeploy.

Even better, you should run your tests as part of a build script (Phing or Ant) that takes care of the database update as well. Read this article for more details.

Re: Testing against production data

Posted: Sun Oct 05, 2008 12:25 am
by Jenk
McGruff wrote:
I feel like I need to test against the production database...or at least a clone of it. Mocking and stubbing doesn't catch these things.
Yes. Mocking a database connection is definitely a bad idea. A simple string comparison is not an adequate way to specify sql logic. Different forms of the same query will produce false negatives. Tests are very brittle: even a single extra whitespace will cause a fail.

Also, a perfectly valid sql query may pass a test but not actually work with the target database schema. It's all a bit self-referential, as you mentioned.
I'm sorry, I had to laugh at that link. That's highlighting a problem with the mock framework, rather than highlighting an issue with mocking connections. There are significant reasons why not mocking connections are bad, too.

1. You are directly manipulating data, there will *always* be the risk of changing production (client) data.
2. Tests can take much longer to run.
3. Your test is no longer entirely encapsulated.

No matter how much you automate tests, you will still need to kick the tyres yourself at some point.

Re: Testing against production data

Posted: Sun Oct 05, 2008 9:12 am
by McGruff
That's highlighting a problem with the mock framework, rather than highlighting an issue with mocking connections.
Yes... but it's a problem that all mock frameworks have. In practice it is impossible to write the required SqlExpectation class. You're stuck with string comparisons: can you debug the failing test...?
1. You are directly manipulating data, there will *always* be the risk of changing production (client) data.
Not if you create a database sandbox, as outlined in the link.

Without a sandbox, there is a slight risk that you could name a test database same as an existing database. It will be dropped at tearDown, or tables could be altered or deleted, etc. Malfunctioning code which dynamically generates sql could do the same (but would be much harder to detect).

That's probably "good enough" for a lot of things but in certain situations, eg business or financial records, it is not acceptable to take any risks whatsoever. Data must be preserved right down to the last bit. Personally, I always use a sandbox and I've never lost any data.

A failure to adequately explore application behaviour through testing - or by using bad tests (mock connections) - is far more likely to lead to data loss. Even if you ignore the problem of specifying sql as a simple string, a query which passes all your mock-tests may not work with the target schema.
2. Tests can take much longer to run.
Of course a test suite will run quicker if you don't write all necessary tests... In the last app I worked on (admittedly something fairly simple) all unit tests, including data-access tests, take just a few seconds to run. "Mock real resources for faster tests" is standard textbook advice but in the real world it may not be a problem. It depends just how big the test suite is.

Even if it is very large, I don't see that you have a choice. Mocks just don't get the job done here.
3. Your test is no longer entirely encapsulated.
With a mocked connection, the sql is simultaneously being tested too much (the annoyingly brittle string comparison) and not enough (does it actually work with the real db schema?) A big chunk of (sql) code - the whole raison d'etre of the test - has in effect escaped from the test harness. You haven't encapsulated anything at all. You'll never find out what effect the sql commands really have until they collide with a real database.

I think the key point is that you're at an application boundary. The rules have changed slightly. It's OK to test with the real resource.

Re: Testing against production data

Posted: Mon Oct 06, 2008 7:31 am
by Jenk
I don't agree with that, sorry. It seems you're possibly falling into "everything must have an automated test" hole. It doesn't. A Functional test, by a person, at a terminal, just as a normal user would interact with the system, is the best test you can have for this example (and others like it.)

There's nothing wrong with mocking a db connection, and testing against an SQL string. It's SQL. MySQL, MSSQL, PostgreSQL, and the others, all have strenuous testing to ensure they adhere to their choice of SQL standard. PDO also tests their library. If your concerns are with string comparison, then create a beautifier to rectify any extra whitespace and so forth, even a simple one like case insensitive, and replace all whitespace with special markers, or just trim and replace with single whitespace etc.

It takes no time at all to correct that sort of thing anyway, even if not using one. (Expected: "A" Got: "a") in the example on the link, a diff tool would next to instantly highlight that problem.

If your schema changes, your functional test will pick that up. The purpose of an automated test is to be run quickly, and often (e.g. T/BDD.. tests are run every few seconds when developing) so even a sandbox will add demand to system (and/or server) resource if you are connecting to the production env. This could well still be fine on small/noncritical apps, but if you have to wait more than 2 or 3 seconds every time you test, you'll soon get very frustrated.

However, this also highlights the problem faced with writing your own SQL anyway - the fickleness of SQL strings. One solution here would be to use a more programmatic method for interacting with the database, which would be much more robust and less prone to typos and so forth, eg Pear::DB's method of statement construction rather than just plain writing.. (I'm not recommending PEAR:DB, it's pants, I'm just recommending the statement construction)

Re: Testing against production data

Posted: Wed Oct 08, 2008 7:27 am
by webaddict
I'm a novice when it comes to automated testing, I'll have to admit. I can however tell you that this particular problem could've been solved using a simple cascading delete. Not sure if this post really helps, but it might :)

Re: Testing against production data

Posted: Wed Oct 08, 2008 9:11 am
by McGruff
It seems you're possibly falling into "everything must have an automated test" hole. It doesn't.

It's a fundamental tenet of test-driven design that code is only ever written in response to a failing test and so, if you're doing it right, there won't be anything which isn't covered by a test (except for trivially simple things like getters or setters). Even if you write tests after the fact rather than test-first, everything still needs to be tested.
A Functional test, by a person, at a terminal, just as a normal user would interact with the system, is the best test you can have for this example (and others like it.)

How many buttons can you click per minute? How long does it take to examine the results of a button click? How many mistakes will you make? Will you miss something important at the end of a long day when you're tired? How many tests can you remember? How many will you have forgotten six months from now?

There's really no comparison between automated and manual testing. The computer will never forget anything, never get tired, never make a mistake, and it will chew through the tests in a fraction of the time you can. How could manual testing possibly be better? I can guarantee that somewhere down the line you won't even be able to remember what you're supposed to be checking.

That incidentally is the main task of TDD: requirements gathering, formalising these as automated tests. An executable specification defines the target to be hit and makes sure it stays hit. Acceptance (functional) tests express customer requirements and then unit tests represent hours of research into the problem domain and possible routes through it.

If you held a gun to a test-infected programmer's head and forced them to choose between the tests and the code, they'd choose the tests every time. With these as a guide it's relatively easy to rattle out more code. Recreating all the tests would be much, much harder.
If your concerns are with string comparison, then create a beautifier to rectify any extra whitespace and so forth, even a simple one like case insensitive, and replace all whitespace with special markers, or just trim and replace with single whitespace etc.
That's just the start of it. Aliases might be named differently. Predicates might have a different order. Subselects v joins. All different strings but the same query. Sql is code and, like all code, it's the logical statements which need to be tested not the string phrasing - and that means throwing the sql at a real database. Would you unit test a php class by comparing an expected class definition with the actual class read from a file? That would also be a very brittle test which doesn't really prove anything.

In practice a mocked connection plays out something like this.

(1) With trivally simple sql like "select foo from bar", and a database schema which never changes, none of my objections would be significant. A mocked connection would be OK. However, I've never written an app like that.

(2) A mocked connection doesn't tie the code down to a real database. If schema changes get out of step with the tests, the tests will pass when they should fail. Schemas - everything really - are more volatile in the early stages of development but even more mature apps may need to change from time to time.

(3) Say you've got a complex query to write - as in the example below. Could you write this straight off the top of your head or will you need to run it past a database for verification?

Code: Select all

select path as path, group_concat(offset) as offsets from files 
inner join (select c.name, c.fid, m.name_offset as offset 
from classes as c 
inner join methods as m using(cid)
where m.name='changeMe' and c.hid in (
    select distinct c.hid from classes as c
    inner join classes_interfaces as ci using(cid)
    inner join (
        select candidates.iid, candidates.name from (
            select distinct i.iid, i.name from interfaces as i
            inner join classes_interfaces as ci using(iid)
            inner join classes as c using(cid)
            where find_in_set(c.name, (
                select group_concat(name) as members from classes as c
                group by hid 
                having find_in_set('Foo', members)))) 
         as candidates
         inner join methods as m using(iid)
         where m.name='changeMe') 
    as links using(iid)) 
 
union 
 
select ifo.name, ifo.fid, m.name_offset as offset from (
    select distinct i.iid, i.name, i.fid from interfaces as i 
    inner join classes_interfaces as ci using(iid)
    inner join (
        select c.cid, c.name from classes as c where c.hid in (
            select distinct c.hid from classes as c
            inner join classes_interfaces as ci using(cid)
            inner join (
                select candidates.iid, candidates.name from (
                    select distinct i.iid, i.name from interfaces
                    inner join classes_interfaces as ci using(iid)
                    inner join classes as c using(cid)
                    where find_in_set(c.name, (
                        select group_concat(name) as members from classes as c
                        group by hid 
                        having find_in_set('Foo', members)))) 
                as candidates
                inner join methods as m using(iid)
                where m.name='changeMe') 
            as links using(iid))) 
    as clh on (clh.cid=ci.cid)) 
as ifo
inner join methods as m using(iid)
where m.name='changeMe') 
as items using(fid)
group by fid
order by path
I defy anyone to write non-trivial sql like this reliably. It's just not possible (this one was built up phrase by phrase until the final query, and the main test, suddenly started to pass). When I said that I thought mocking connections and specifying sql string expectations was a bad idea I was really just being polite: I think it's insane.

Still, let's say I soldier on. I will make mistakes; no doubt about it. Can you spot the deliberate mistake (a different one this time) in the above query? Of course you can't - you need the dbms parser.

So, we have a malfunctioning app. What happens next? Nothing. Since the class is written to conform to specifications defined in the tests it will share the same error. Expected sql matches actual sql received and so the tests are passing when they should fail.

OK a manual test at the UI level would alert you to the problem but will you remember to carry it out, and will you remember the exact procedure which exercises this part of the app?

Perhaps you will but even so a failing acceptance test doesn't tell you the reason for failure. It could be anything anywhere in the app. Normally the unit tests would provide detailed information when something goes wrong but in this case the test isn't failing - a false positive. You certainly do need (automated) functional tests but they are at the wrong level to test such fine-grained details.

Re: Testing against production data

Posted: Wed Oct 08, 2008 9:39 am
by Jenk
I didn't say acceptance test, I said functional. A Person, who is testing - and so will know to make a note of errors reliably, or else they are not worthy of being a tester - will note the error and production sequence (i.e. what buttons etc. caused it.)

Selenium, although I haven't used it, can do this for you as well, repeatable, reliable. JMeter is another product that can also do this.

It's also a fundamental tenet of test driven development to encapsulate all behaviour within the test, and not rely on external resources.. something this is absolutely doing.

Re: Testing against production data

Posted: Wed Oct 08, 2008 10:39 am
by McGruff
Functional tests and acceptance tests are one and the same, ie top-level tests which exercise the UI: http://www.extremeprogramming.org/rules ... tests.html.
It's also a fundamental tenet of test driven development to encapsulate all behaviour within the test, and not rely on external resources
You're at an application boundary. The rules have changed slightly.

A mock defines a contract. In your own code, you can meet that contract by writing classes with the same behaviour as the mocks but, at a boundary, it's a promise you can't fulfil. The external resource is not under your control. Mocks may dogmatically assert specific resource behaviours but we really don't know how well this matches up to real life.

You haven't encapsulated anything with a mock. The sql, the thing you are trying to test, has in effect escaped from the test harness because it isn't being tested properly. Tests which should pass will fail and tests which should fail will pass. False negatives are a nuisance but false positives are a very serious problem: you've got a bug but nothing to alert you to that until a user complains.

Re: Testing against production data

Posted: Tue Oct 14, 2008 5:30 am
by Jenk
Actually, it's just as important to not change the rules at an application boundary as it is anywhere else. For integration testing, sure, there is no alternative but to talk directly to the database, but when using TDD it is ultra important to have everything under your control, in a controlled environment - i.e. all within a test case, and not to have it rely on anything outside of that test case.

Re: Testing against production data

Posted: Sun Nov 02, 2008 9:06 am
by josh
If you've abstracted your model's data access from the model itself, you can test it's corresponding data access components in one contained environment, to make sure you're seeding your models with good data from the data source layer, and test the actual logic of the models separately, which has nothing to do with integration tests as per my understanding of it, as per my understanding the integration test would actually be testing only that the data source objects talked to the model objects correctly, ensuring the signatures of the data being passed matched up.

Pointing your models at a real live data source as part of some test ( unit or integration ) is pretty essential though, testing the actual generated SQL would be one way to test your data source layer, but shouldn't be part of testing the models. A better way of testing generated SQL ( whitebox testing ) would be testing the returned data from the mappers ( black box testing ).

If I have multiple data sources, that contain the same data representation, I should have 1 test or 1 interface for getting at that data, if all my data source objects adhere to that common interface you should be able to use the same test cases polymorphically, without worrying about generated SQL, if or if the mapper is accessing data stored via INI or XML

Re: Testing against production data

Posted: Tue Nov 04, 2008 1:55 pm
by allspiritseve
Jenk wrote:when using TDD it is ultra important to have everything under your control, in a controlled environment - i.e. all within a test case, and not to have it rely on anything outside of that test case.
You don't have the SQL you write under control though. You can either attempt to rewrite the entire db engine in order to verify your SQL is written correctly (mocking), or you can just use the db and let it verify your SQL for you. If you think removing whitespace and comparing SQL strings is sufficient, you're in for a surprise.

Re: Testing against production data

Posted: Wed Nov 05, 2008 6:53 am
by Jenk
Or you can just use a DAO library, or use intelligent string comparison.. split the string and trim each word, etc.. to repeat myself once again.

You guys have missed the one fundamental point of TDD - it is highlighting a major porblem for you, but instead of fixing the problem, you are breaking TDD rules to work around it. This SQL string comparison is an issue. It is always going to be an issue, and always has been an issue, so for god's sake find or create something to fix it. Breaking a massive mantra of TDD is not fixing it.

Here is something that will fix the issue shown on the "complex" sql string link..

Code: Select all

function sqlStringCompare ($stringOne, $stringTwo) {
  $stringOne = implode(' ', array_filter(preg_split("/\s+/", $stringOne)));
  $stringTwo = implode(' ', array_filter(preg_split("/\s+/", $stringTwo)));
  return ($stringOne == $stringTwo);
}

Re: Testing against production data

Posted: Wed Nov 05, 2008 7:56 am
by inghamn
Jenk wrote: Or you can just use a DAO library, or use intelligent string comparison.. split the string and trim each word, etc.. to repeat myself once again.
The code we're talking about...what we're trying to test...*is* the DAO library. We're talking about testing that the ActiveRecord object did in fact update or delete all the stuff in the database that it was supposed to. Testing your DAO with simple string comparison is not going to find any problems.

Re: Testing against production data

Posted: Wed Nov 05, 2008 8:35 am
by allspiritseve
Jenk wrote:Breaking a massive mantra of TDD is not fixing it.
TDD, especially in PHP, is still relatively new. I don't know where you get this idea that it has some rigid structure you have to follow in order to bring about results.

Re: Testing against production data

Posted: Wed Nov 05, 2008 9:18 am
by Jenk
TDD is not new. It is 30+ years old.

The mantra of "Everything must be controlled within a test case, and have no external dependencies" is being broken.