Testing against production data
Moderator: General Moderators
- inghamn
- Forum Contributor
- Posts: 174
- Joined: Mon Apr 16, 2007 10:33 am
- Location: Bloomington, IN, USA
Testing against production data
Last week I got called when I was on vacation...one of the recent applications I'd written that was in production was throwing foreign key errors. New features had involved changes to the database schema that hadn't been accounted for in the delete function of one of my models.
I feel like I need to test against the production database...or at least a clone of it. Mocking and stubbing doesn't catch these things.
Is there a recommended way to set up PHPUnit to clone a production database to do tests against the clone?
I feel like I need to test against the production database...or at least a clone of it. Mocking and stubbing doesn't catch these things.
Is there a recommended way to set up PHPUnit to clone a production database to do tests against the clone?
Re: Testing against production data
Change the spec/test of your delete function to accommodate the change. Use a test database (clone) in a functional test.
Unit tests should not touch databases at all (by rule of thumb) let alone production!
Unit tests should not touch databases at all (by rule of thumb) let alone production!
- inghamn
- Forum Contributor
- Posts: 174
- Joined: Mon Apr 16, 2007 10:33 am
- Location: Bloomington, IN, USA
Re: Testing against production data
The code that I'm dealing with is the database code. To test it, I need to verify that the information got into the database successfully - or was removed successfully.
Code like:
Code like:
Code: Select all
class Appointer extends ActiveRecord
{
private $id;
private $name;
/**
* This will load all fields in the table as properties of this class.
* You may want to replace this with, or add your own extra, custom loading
*/
public function __construct($id=null)
{
if ($id)
{
$PDO = Database::getConnection();
$query = $PDO->prepare('select * from appointers where id=?');
$query->execute(array($id));
$result = $query->fetchAll(PDO::FETCH_ASSOC);
if (!count($result)) { throw new Exception('appointers/unknownAppointer'); }
foreach($result[0] as $field=>$value) { if ($value) $this->$field = $value; }
}
else
{
# This is where the code goes to generate a new, empty instance.
# Set any default values for properties that need it here
}
}
/**
* Throws an exception if anything's wrong
* @throws Exception $e
*/
public function validate()
{
# Check for required fields here. Throw an exception if anything is missing.
if (!$this->name) { throw new Exception('missingName'); }
}
/**
* This generates generic SQL that should work right away.
* You can replace this $fields code with your own custom SQL
* for each property of this class,
*/
public function save()
{
$this->validate();
$fields = array();
$fields['name'] = $this->name;
# Split the fields up into a preparedFields array and a values array.
# PDO->execute cannot take an associative array for values, so we have
# to strip out the keys from $fields
$preparedFields = array();
foreach($fields as $key=>$value)
{
$preparedFields[] = "$key=?";
$values[] = $value;
}
$preparedFields = implode(",",$preparedFields);
if ($this->id) { $this->update($values,$preparedFields); }
else { $this->insert($values,$preparedFields); }
}
private function update($values,$preparedFields)
{
$PDO = Database::getConnection();
$sql = "update appointers set $preparedFields where id={$this->id}";
$query = $PDO->prepare($sql);
$query->execute($values);
}
private function insert($values,$preparedFields)
{
$PDO = Database::getConnection();
$sql = "insert appointers set $preparedFields";
$query = $PDO->prepare($sql);
$query->execute($values);
$this->id = $PDO->lastInsertID();
}
public function delete()
{
if ($this->id)
{
$PDO = Database::getConnection();
$query = $PDO->prepare('update seats set appointer_id=null where appointer_id=?');
$query->execute(array($this->id));
$query = $PDO->prepare('delete from appointers where id=?');
$query->execute(array($this->id));
}
}
#----------------------------------------------------------------
# Generic Getters
#----------------------------------------------------------------
public function getId() { return $this->id; }
public function getName() { return $this->name; }
#----------------------------------------------------------------
# Generic Setters
#----------------------------------------------------------------
public function setName($string) { $this->name = trim($string); }
#----------------------------------------------------------------
# Custom Functions
# We recommend adding all your custom code down here at the bottom
#----------------------------------------------------------------
public function __toString() { return $this->name; }
}
Re: Testing against production data
No, you're testing that your module generates the correct SQL, or builds PDO queries correctly - you are not actually testing the removal of data. 
- inghamn
- Forum Contributor
- Posts: 174
- Joined: Mon Apr 16, 2007 10:33 am
- Location: Bloomington, IN, USA
Re: Testing against production data
What is the correct SQL? The SQL that I think it's supposed to be according to some test I write, that comparese the string? Or the SQL that actually runs agains a database and would error out?
If I just test against what I "think" the SQL should be...that doesn't keep the application from breaking when the schema changes over time. The problem came when the schema changed but the model was not updated to take into account the new foreign key.
If I just test against what I "think" the SQL should be...that doesn't keep the application from breaking when the schema changes over time. The problem came when the schema changed but the model was not updated to take into account the new foreign key.
Re: Testing against production data
Which is where your functional test comes into it..
- inghamn
- Forum Contributor
- Posts: 174
- Joined: Mon Apr 16, 2007 10:33 am
- Location: Bloomington, IN, USA
Re: Testing against production data
Okay, so, by functional testing...we mean bringing in something like Selenium?
Re: Testing against production data
No, I mean non-automated testing.
It still all boils down to someone needing to check if the version/location/ip/name/whatever of the database is correct - which cannot be *guaranteed* by any automated test. Nor can manipulation of data, even if it's cloning it might lock the tables, and fail to unlock if there is an error or whatever. Just test that your DataLayer generates the expected SQL, then functionally test the actual data manipulation with every release, or change to the data structure/layer.
It still all boils down to someone needing to check if the version/location/ip/name/whatever of the database is correct - which cannot be *guaranteed* by any automated test. Nor can manipulation of data, even if it's cloning it might lock the tables, and fail to unlock if there is an error or whatever. Just test that your DataLayer generates the expected SQL, then functionally test the actual data manipulation with every release, or change to the data structure/layer.
- inghamn
- Forum Contributor
- Posts: 174
- Joined: Mon Apr 16, 2007 10:33 am
- Location: Bloomington, IN, USA
Re: Testing against production data
We may just be having a misunderstanding. Non-automated testing of every last feature of the application every time we add something new sounds insane to me. If it's non-automated, it's realistically never going to happen.
New features or changes to existing features of the web applications I build all involve database schema changes of some kind. New tables, new fields to existing tables, and definitely new foreign keys are added over time as the application grows.
Foreign keys are really good things to have, but they cause problems if they're not accounted for in the model layer. Adding a new table to the system, creating some new foreign key, is going to break existing code, until you go into the affected older models and add some code.
It's this kind of breakage that I have. Now don't get me wrong, I don't have these breakages very often. And when I get an error report, it's easy to fix. But I'm trying avoid having these breakages in the first place. That way I don't get calls when I'm out on vacation and have to talk someone through fixing it over the phone.
Without actually running the SQL calls from the models against a database, these errors are never going to be caught. Maybe it's not unit testing or functional testing, it really doesn't matter what it's called. But there has to be an automated way of running these database tests.
It seems like PHPUnit has support for the database setup and teardown. My first attempts have be to blow away the database, and load a testData.sql into the database, giving me a good known database state to run tests against. But again, if the testData doesn't have the new table-foreign key-whatever, the tests won't catch it.
So it seems what I'm looking for is a way to safely test against real data - production data. My inclination is to dump production data and load it into a test database, then make the necessary changes to the data to accommodate the tests, then run the tests. It seems like a lot of setup with quite a bit of manual work dumping databases, loading databases. But how else do you test that your ORM layer actully works correctly with your data?
New features or changes to existing features of the web applications I build all involve database schema changes of some kind. New tables, new fields to existing tables, and definitely new foreign keys are added over time as the application grows.
Foreign keys are really good things to have, but they cause problems if they're not accounted for in the model layer. Adding a new table to the system, creating some new foreign key, is going to break existing code, until you go into the affected older models and add some code.
It's this kind of breakage that I have. Now don't get me wrong, I don't have these breakages very often. And when I get an error report, it's easy to fix. But I'm trying avoid having these breakages in the first place. That way I don't get calls when I'm out on vacation and have to talk someone through fixing it over the phone.
Without actually running the SQL calls from the models against a database, these errors are never going to be caught. Maybe it's not unit testing or functional testing, it really doesn't matter what it's called. But there has to be an automated way of running these database tests.
It seems like PHPUnit has support for the database setup and teardown. My first attempts have be to blow away the database, and load a testData.sql into the database, giving me a good known database state to run tests against. But again, if the testData doesn't have the new table-foreign key-whatever, the tests won't catch it.
So it seems what I'm looking for is a way to safely test against real data - production data. My inclination is to dump production data and load it into a test database, then make the necessary changes to the data to accommodate the tests, then run the tests. It seems like a lot of setup with quite a bit of manual work dumping databases, loading databases. But how else do you test that your ORM layer actully works correctly with your data?
Re: Testing against production data
There needs to be a degree of separation. You're using PDO, PDO is tested. As long as you are programmatically using it correctly, it'll work. So your level of automated testing only needs to stretch as far as that interface. You just need to test that your application "talks" to PDO correctly, usually accomplished by mocking the PDO object.
The functional test does not need to be every single feature, only a few to establish that the connection is active, and that PDO is able to communicate with the DB server.
The functional test does not need to be every single feature, only a few to establish that the connection is active, and that PDO is able to communicate with the DB server.
- inghamn
- Forum Contributor
- Posts: 174
- Joined: Mon Apr 16, 2007 10:33 am
- Location: Bloomington, IN, USA
Re: Testing against production data
Here's the real-life example that prompted this discussion. Here's the old function from the Task class. I've left out all the other functions just to save space:
If you ran all you tests without actually hitting the database, you never got an error. Everything looks okay - the SQL is properly formatted and there's no syntax errors. It looks like it does what it's supposed to do.
But - put this into production and nobody can delete a task. Turns out theres one more foreign key for a table that needs to be cleaned out before deleting the task. The error only happens when you actually hit the database - then PDO hands it back as an exception.
What we needed to do was start by cleaning out the taskIndex first:
In the end, the only way to know your application works is to run it against actual data. I don't really care what you call the tests: functional tests, integration tests, acceptance tests, who cares. But you have to run a set of tests against a real life database.
I can vouch that 100% of the time, when a user reports an error on one of our applications, it comes from some new field or table that's been added with a foreign key that wasn't being handled by the previous code. The database throws errors when you try to do something, and older features stop working.
Because we're not doing Rails-style active record, where the application dynamically looks up all the database schemas every request, we're left with static code to work with the data. This static code is going to get out of date over time. We need some sort of tests to make sure that the code that's written is actually in line with real data.
Code: Select all
class Task extends ActiveRecord
{
public function delete()
{
if ($this->id)
{
$PDO = Database::getConnection();
$query = $PDO->prepare('delete from task_notes where task_id=?');
$query->execute(array($this->id));
$query = $PDO->prepare('delete from task_tags where task_id=?');
$query->execute(array($this->id));
$query = $PDO->prepare('delete from task_users where task_id=?');
$query->execute(array($this->id));
$query = $PDO->prepare('delete from task_departments where task_id=?');
$query->execute(array($this->id));
$query = $PDO->prepare('update tasks set parent_id=? where parent_id=?');
$query->execute(array($this->parent_id,$this->id));
$query = $PDO->prepare('delete from tasks where id=?');
$query->execute(array($this->id));
$search = new Search();
$search->remove($this);
self::updateTaskIndex();
}
}
}
But - put this into production and nobody can delete a task. Turns out theres one more foreign key for a table that needs to be cleaned out before deleting the task. The error only happens when you actually hit the database - then PDO hands it back as an exception.
What we needed to do was start by cleaning out the taskIndex first:
Code: Select all
class Task extends ActiveRecord
{
public function delete()
{
if ($this->id)
{
$PDO = Database::getConnection();
$query = $PDO->prepare('delete from taskIndex');
$query->execute();
$query = $PDO->prepare('delete from task_notes where task_id=?');
$query->execute(array($this->id));
$query = $PDO->prepare('delete from task_tags where task_id=?');
$query->execute(array($this->id));
$query = $PDO->prepare('delete from task_users where task_id=?');
$query->execute(array($this->id));
$query = $PDO->prepare('delete from task_departments where task_id=?');
$query->execute(array($this->id));
$query = $PDO->prepare('update tasks set parent_id=? where parent_id=?');
$query->execute(array($this->parent_id,$this->id));
$query = $PDO->prepare('delete from tasks where id=?');
$query->execute(array($this->id));
$search = new Search();
$search->remove($this);
self::updateTaskIndex();
}
}
}
I can vouch that 100% of the time, when a user reports an error on one of our applications, it comes from some new field or table that's been added with a foreign key that wasn't being handled by the previous code. The database throws errors when you try to do something, and older features stop working.
Because we're not doing Rails-style active record, where the application dynamically looks up all the database schemas every request, we're left with static code to work with the data. This static code is going to get out of date over time. We need some sort of tests to make sure that the code that's written is actually in line with real data.
Re: Testing against production data
Yes. Mocking a database connection is definitely a bad idea. A simple string comparison is not an adequate way to specify sql logic. Different forms of the same query will produce false negatives. Tests are very brittle: even a single extra whitespace will cause a fail.I feel like I need to test against the production database...or at least a clone of it. Mocking and stubbing doesn't catch these things.
Also, a perfectly valid sql query may pass a test but not actually work with the target database schema. It's all a bit self-referential, as you mentioned.
What I usually do is keep schemas in version control and setUp/tearDown the database for each test. That guarantees that the code works with a given database structure but doesn't guarantee that the production db matches this. However, schemas don't tend to change that often.
When they do, I'd probably write an update script, apply that on a staging server (which can be a sandboxed area of the production box, virtual machine, etc) and then run the full test suite. That will prove the app can be successfully updated with the new schema. All that's left is to apply the script to the production site and possibly run a few acceptance tests as a basic sanity check.
- inghamn
- Forum Contributor
- Posts: 174
- Joined: Mon Apr 16, 2007 10:33 am
- Location: Bloomington, IN, USA
Re: Testing against production data
I'm planning on using a mysqldump of the production data to do the tests against. Doing a database reload during setUp/tearDown can get quite slow, I think. But since they're tests it's okay to just live with the slowness?McGruff wrote: What I usually do is keep schemas in version control and setUp/tearDown the database for each test.
One of the things I'm trying to do is apply TDD to this process. During development, especially initial development, the schema is changing after each new feature. Am I pushing it too far? Should TDD just apply to non-database tests?McGruff wrote: That guarantees that the code works with a given database structure but doesn't guarantee that the production db matches this. However, schemas don't tend to change that often.
Re: Testing against production data
It depends how many tests you have but in general I don't think that will be a problem. You will notice that these tests are slower but perhaps only seconds rather than microseconds. Try it and see (and let me know how you get on).inghamn wrote:Doing a database reload during setUp/tearDown can get quite slow, I think. But since they're tests it's okay to just live with the slowness?
If tests are slow they tend not to get run so often, which is bad. You could try keeping the schema and just emptying all the tables of data between each test. I'm not sure if that will gain anything though.
No I don't think so. The query in the "Testing Data Access Classes" link was TDD'd. There are lots of sub-selects to piece together - far too complicated to figure out in one big gulp.One of the things I'm trying to do is apply TDD to this process. During development, especially initial development, the schema is changing after each new feature. Am I pushing it too far? Should TDD just apply to non-database tests?
The only difference is that you're at an application boundary and so mocking is less useful. In your own code, you can mock away freely in the knowledge that a real object behaves the same as the mock claims it will - because you'll write them that way. At the boundary, you're making a leap of faith. What if an external resource changes its behaviour or its API? The mocked resource in the tests won't reveal this. That's not necessarily a deal-breaker but as a general rule whenever I hit an application boundary I will use the real resource rather than mocking: filesystem, data access, etc.
Also, you don't need mocks to tease out the next layer of the design: you're at the end of the line.
- inghamn
- Forum Contributor
- Posts: 174
- Joined: Mon Apr 16, 2007 10:33 am
- Location: Bloomington, IN, USA
Re: Testing against production data
whew! I'm getting close every day. Thanks so much McGruff. Now I'm off to write a whole batch of new tests.