Thursday, January 12, 2012

Effective Unit Testing - Not All Code is Created Equal

Unit Testing is one of the most adopted methodologies for high quality code. Its contribution to a more stable, independent and documented code is well proven . Unit test code is considered and handled as an a integral part of your repository, and as such requires development and maintenance. However, developers often encounter a situation where the resources invested in unit tests where not as fruitful as one would expect. This leads us to wonder, as in any investment, where and how should resources be invested in unit tests?
Current metric used to assess the quality of unit testing utilize the notion of code coverage. Code coverage describes the effectiveness to which the source code of a program has been tested. In an ideal world every method we code will have a series of tests covering it’s code and validating it’s correctness. However, usually due to time limitations we either skip some tests or write poor quality ones. In such reality, while keeping in mind the amount of resources invested in unit testing development and maintenance, one must ask himself, given the available time, which code deserve testing the most? And from the existing tests, which tests are actually worth keeping and maintaining? We will try to answer those questions today.
We believe that not all code is created equal. There are certain code sections that are harder to test than others. Other code sections are more important than others. We suggest a few guidelines which will help determine in what code sections to invest in Unit Testing first, and maintaining as well:
  1. Usages of code – when code is used frequently, it is important to unit test it.
  2. Code dependencies – similar to (1), when other code is heavily dependent on the examined code, the more important it is to unit test it. On the other hand, when the examined code is greatly dependent on other code, it is harder to test and the chances to catch a fault is smaller.
  3. I\O dependency – code which is dependent on I\O (DB, Networking, etc), is harder to test, as it requires creating mock objects which simulate the behavior of the I\O components. This mock objects require developing, maintenance and are vulnerable to bugs on their own. Moreover, writing mock objects that will simulate the exact behavior of any given I\O, such as faults is not trivial at all.
  4. Multithreaded code –multithreaded code behavior is unexpected and as such harder to test.
  5. Cyclomatic complexity – this metric is used to indicate the complexity of your source code. The higher the complexity, it is more important to test the code.
  6. Code accessibility – this measure is related to the number of people that are acquainted with the source code in question. The bigger the accessibility is the less testing is needed, since problems will be identified and handled more rapidly.

Regarding the latter question presented above, we suggest a new approach for managing Unit Tests. This preliminary idea defiantly needs some polish, and we only present a rough outline.

After taking all the above into account, the real bother is maintaining the tests. We suggest thinking on a single unit test as a stock. We can keep track on each test unit, treating them as dynamic objects that have initial value that can change over time. According to the above points, we can give each test a preliminary value, indicating its importance. Note that most of the attributes above, can be determined automatically. The change in value over time is related to our profit from the test. Each time a test fails and catches a real bug, its value increases and each time you invest in fixing the test itself, while not catching any real bug in your business logic, its value decreases. Moreover, each time you need to change the code of a test, as a result of change in your business logic, its value stays the same.

The above model is not complete, as we only wanted to give a general idea on effective unit testing. There is the question of how each value for our suggested points is computed? how will the preliminary value for each test will then be determined? and how much should we increase/decrease over time? This questions can be answered, for example, by using machine learning techniques, but it is out of the scope of this post.


  1. I really like your notion that Unit tests are a part of the very limited resource (usually called "developer" ... ;-) ). I also agree with the fact that running all the tests every day, is very ineffective.
    How about a unit test environment, that for each test, decides if it will run according to it's stock value.
    So tests that are not worth investing (because they never fail) run in a low probability.
    This would assure that once in a while you will get regression tests, but would not have to invest resources on tests that do not bring value...

    1. The current approach for unit tests is: The more the better. Our main goal in the second part of the post is to remind people that each new test you add costs. So we need to examine its ROI.

    2. Ofer, thank you for your reply. I like your idea, but this still means you will need to write the test itself.

    3. Not necessarily.... For low value code sections that you don't want to invest in, write an empty test (i.e. "return true;). It's value will anyhow be low so that it wouldn't run and spend resource.
      However, if at any point, you will need to rewrite this unit test (because bugs were detected by some other means - QA / customer etc.), it's value will naturally gain, and thus it will run more often.
      Think of it as having "fake" stocks of companies that are not traded in the market...

    4. Hmm...I did not see it as an empty test. This means that fake tests will have 0 probability running? You are right, this enhances the model. Accepted :)