Apparently we’ve done a disservice to the world with the word correlation. Recently I’ve been involved in several discussions surrounding the effectiveness of unit testing. As you’re probably aware, unit testing, like all testing, is about 35-50% capable at removing defects. That is, if you inject 100 defects, unit testing can typically find between 35-50 of them.
As a result, we say that there’s a correlation between unit testing and better quality. That’s correlation as in “there’s a relationship between” and not necessarily the linear correlation people think of that could be expressed in the form Y = mx + b.
In these discussions, multiple people expressed concern with our measurement system, which was simply measured as defects per unit of work delivered. It’s a common measure of defect density which accounts for the size of the software being delivered. Unit of work can be function point or line of code, or something else. At any rate, they were concerned because our data was too “noisy” because we counted all defects, whether they were coding errors, requirements errors, etc.
So, essentially we were comparing the defect density of a population that did unit testing vs one that did not. And we simply counted all the bugs that were in both populations, whether or not unit testing was designed to find them. And what do you know, we were able to detect a statistical difference in the populations.
This seems to really confuse people. How is it that we measure everything, which includes things which we didn’t control for, and yet we can detect a difference? The issue they were having was they wanted to measure a very specific defect density – unit-test-specific defect density.
This isn’t necessary. Although there are other factors, like whether you did code reviews, design reviews, etc. if those things vary randomly between the does and doesn’t do unit testing populations, then it’s still possible to detect a difference.
It isn’t a univariate world, but statistics can handle that. We needn’t over-complicate our measurement systems to try and get a laser focus on what we think we care about. We can simply measure the overall outcome, and if it makes a difference we ought to be able to detect it.