Kill all mutants

The post below is the content from my 2014 J-Fall and Devoxx Ignite presentations. You can check you the slides here:
http://www.slideshare.net/royvanrijn/kill-the-mutants-a-better-way-to-test-your-tests

We all do testing

In this day and age you aren’t considered a real Java developer if you are not writing proper unit tests.
We all know why this is important:

  • Instant verification our code works.
  • Automatic future regressions tests.

But how do we know we are writing proper tests? Well most people use code coverage to measure this. If the percentage of coverage is high enough you are doing a good job.

What is a test?

First let’s look at what a test actually is:

  1. Instantiate classes, setup mocks.
  2. Invoke something.
  3. Assert and verify the outcome.

Which steps are measured with code coverage? Only steps 1 and 2. And what is the most important thing for a test? It is the third and final step, the assertion, the place where you actually check if the code is working. This is completely ignored by code coverage!

I’ve seen companies where management looks at code coverage reports, they demand the programmers to write 80+ or 90+% coverage because this proves the quality is good enough. What else is a common thing in these organisations? Tests without any real assertion. Tests written purely to boost coverage and please management.

So code coverage says absolutely nothing about the quality of our tests? Well, it does tell you one thing. If you have 0% coverage, you have no tests at all, if you have 100% coverage you might have some very bad tests.

Mutation testing

Luckily there is help around the corner, in the form of mutation testing. In mutation testing you create thousands of ‘mutants’ of your codebase. So what is this mutant you might ask? A mutation is a tiny singular change in your codebase.

For example:

// Before:
if(amount > THRESHOLD) {
    // .. do something ..
}

// After:
if(amount >= THRESHOLD) {
    // .. do something ..
}

For each mutant the unit tests are run and there are a couple of possible outcomes:mutant_killed

Killed

 

If you are lucky a test will fail. This means we have ‘killed’ our mutant. This is a positive thing, we’ve actually checked that the mutated line of code is correctly asserted by a test. Now we immediately see the advantage of using mutation testing, we actually verify the assertions in our tests.

Survivedmutant_lived

Another possible outcome is that our mutant has survived, meaning no test fails. This is scary, it means the logic we’ve changed isn’t verified by a test. If someone would (accidentally) make this change in your codebase, the automatic build won’t break.

 

Tooling

In Java (and other languages as well) there are frameworks for doing mutation testing. One of the most complete and modern frameworks for doing mutation testing in Java is called PIT. The generation of mutants and the process running the tests is fully automated and easy to use, just as easy as code coverage. There are Maven, Ant, Gradle, Sonarqube, Eclipse and IntelliJ plugins available!

What about performance?

Using mutation testing isn’t a silver bullet and it doesn’t come without any drawbacks. The major disadvantage is performance. This is the reason it never took off in the 1980s. At that time it would take an entire evening to run all your unit tests, so people could only dream of creating thousands of mutants and running the tests again. Luckily CPU’s have become a lot faster, and PIT has other tricks to speed up the process.

One thing PIT does is that it uses code coverage! Not as a measurement of the quality of your tests but as a tool. Before creating the mutants PIT calculates the code coverage of all unit tests. Now when PIT creates a mutant of a particular line of code it looks at the tests covering that line. If a line is only covered by three unit tests, it only runs those three tests. This greatly decreases the amount of tests needed to run for each mutation.

There are other tricks as well, for example PIT can track the changes in your codebase. It doesn’t need to create mutants if the code isn’t edited.

Summary

Code coverage is a horrible way of measuring the quality of your tests. It only says something about the invocations but nothing about the actual assertions. Mutation testing is much better, it gives an accurate report on the quality and you can’t ‘game’ the statistics. The only way to fake mutation coverage is to write real tests with good assertions.

Check it out now: http://pitest.org