Code Tester - Testing and refreshing data from production

Code Tester is the most powerful PL/SQL test automation tool available. You describe the expected behavior of your programs and Code Tester generates your test code, which can then be run from the UI or via a script. With Code Tester, you can build comprehensive regression tests and even implement the Test Driven Development methodology.

In the process of talking about Code Tester with many developers, I have come across a belief regarding code testing and the refreshing of data from production tables that I think actually reflects a misunderstanding about both how to use Code Tester specifically and, more generally, how to test our code.

Here's the way this belief was expressed by a customer recently:

"We have a development environment – let's call it TEST. Inside this environment is the supporting schema / repository for Code Tester, as well as the test definitions and generated test code.

"Once a week, we refresh the TEST environment from the production environment, PROD. PROD doesn't contain a Code Tester repository, so after I refresh, I lose my test repository and code."

Clearly, this developer can do an export of the Code Tester schema and then import it after refresh, but he was concerned about having to add any overhead for the DBA to this refresh process.

Actually, I think the problem goes much deeper than that: if you refresh your test tables with production data on a regular basis, you will find it very difficult indeed to create stable regression tests that can be run against your code.

Why do I say that? Because when you refresh data from production, you change the "inputs" to your programs (contents of the tables) and therefore you will almost certainly have to change the expected results for your tests.

In fact, I think that when it comes to functional testing of your programs (does it meet user requirements?), you should not be refreshing your test tables from production. To understand why I would say this, let’s talk about….

How we make sure our programs work

There are as many kinds of tests as there are definitions of what it means for a program to "work." We need to make sure, for example, that our programs meet functional requirements (they are correct) and also that they run quickly enough to avoid user frustration. The programs need to scale up for many users and lots of data, etc.

Quest Code Tester is designed specifically to help you implement tests on functional requirements; in essence, to verify that your program is correct.

To do this, I will almost always want to compare the actual results of running my program with the expected results or control data.

For example, if table XYZ contains a certain set of rows, then after running the program, table ABC should be changed in a specific way. Or the program is a function such that when I pass "ABC" for an IN argument, the function returns 100, and so on.

In general, there is no room for ambiguity here. Either the program works as expected or it does not.

Now, if I want to build regression tests and automate the process of testing my program, I need to able to tell Code Tester that for a given set of inputs, I expect the associated outcomes. And – this is the key thing to realize – those inputs can’t keep changing on me. Every time the inputs change, I would need to change the expected outcomes.

Constantly updating one's test code might make sense if you are perform manual tests from hand-coded scripts. But if you want to build comprehensive, serious regression tests, then you need a stable, consistent environment from which to run those tests.

Every time you refresh data from production, you change the values in your table and thus you cannot reliably execute your regression test.

But don't we need real production data to really test?

Is this a big problem? Does this mean that we can't or shouldn't build static, repeatable regression tests for our code? After all (so the thinking goes), we need to test our code against production data to make sure that code handles "real world" situations.

Clearly, our code does need to work properly with production data.

This does not mean, however, that you need to constantly change the data for your functional tests as production data changes. It does mean that the data used in functional tests should represent the variety of data found in production.

In other words, the data in your test tables must be sufficiently varied to allow you to exercise the program to verify all requirements. It doesn't really matter so much that the data is precisely the same as that found in production.

And, again, if you keep changing the test data, you must also change your test definition.

OK, but don’t we need to test against production volume?

There's another problem with basing functionality testing around production data: the data volumes are generally too large, increasing the time it takes to complete the tests.

You certainly do need to make sure your code executes with production volumes of data. But that sort of stress testing should be done independently of your functionality testing. With functionality testing, you want your tests to run as quickly as possible, for these reasons:

You will have lots of (dozens, perhaps hundreds) of separate test cases to run; if each test takes five minutes due to data volume, the test cycle will take an enormous amount of time.
Ideally, you run your tests after each change you make to your program. That way, you can immediately determine if you have introduced any bugs. But if running those tests takes an hour, you will test less frequently and you will get less "return" on your investment of creating your tests.

Conclusion: Segregate your functional test environment

The most important thing to verify about your program is that it is correct: it meets user requirements.

The best way to do this is to build a regression test that you can run after any change to the program, to ensure that it has no bugs.

A regression test should not have to be changed as long as the program itself has not changed. It should work from a consistent set of "inputs" (values for IN arguments, contents of any tables queried by programs, etc.) that do not change. [Of course, you may need to make some changes along the way as program requirements change, as you add more test cases, etc. That is different, however, from daily or weekly refreshes.]

In addition, you want regression tests to run as efficiently as possible. These tests focus on program functionality, not performance. So you want the minimum volume of data in test tables that allow you to cover your requirements. Use different tests to verify adequate performance.

Consequently, when it comes to functionality testing, you should avoid refreshing your test tables from production. Instead, invest the time upfront to come up with setup scripts to populate tables with data that fully exercises your code. Include those setup scripts in your Code Tester test definitions and then you have an independent, consistent test environment.

How we make sure our programs work

But don't we need real production data to really test?

OK, but don’t we need to test against production volume?

Conclusion: Segregate your functional test environment

Start the discussion at forums.toadworld.com