At one point in time, if you asked me, I would have wholeheartedly agreed with Capers Jones and said the one critical measure you need to have is DCR (Defect Containment Rate). Now that I’ve had several attempts to try and make DCR a reality, I’m convinced its one of the least useful measures you could have.
Where to begin:
1. It’s incredibly lagging. Once you put something into production, it can take weeks or months for all the missed defects to surface. This is because of the way the system gets used and the potential lack of interest on the part of your users to report the issues. Pretty much any measure you make will be biased towards optimism. Optimism is not something you want in your measurement system, because optimism drives inaction.
2. It’s hard to do. Knowing what test defects you found is easy. Knowing whether a production defect was caused by your project takes work. You have to figure out when it manifested iself in the code, possibly loading multiple versions of the application into development environments to figure out when it actually got created. And then there are the odd side effects that you can never be quite sure if you caused or not – like stability issues, etc. which manifest themselves from increased usage of your application now that you introduced new features. Did this project cause the issue? Well, no, not directly, but it contributed to the manifestation of it potentially.
3. It doesn’t credit good behavior throughout the lifecycle. Ideally, DCR should capture all the defects you contain in all the stages of software development. Do code reviews and find things to fix? You should count those. Right? Well, without running the code, you can’t actually be sure if the thing you found during code review would ever actually result in a defect. Sometimes it’s obvious that it would (like a NULL pointer) or would not (like a shortage of comments) but oftentimes we don’t know. We know that we don’t like the way the code was written and that there is a cleaner way to do it. That hopefully contributes to the long term stability of the application. In some sense, that means avoiding future defects, but did you actually contain a defect? No, probably not, but DCR is never going to give you credit for avoiding a future defect.
4. It tells you something we already know. We’ve got this thing about not believing the industry. Though several researchers have found that testing will remove 35-50% of defects per cycle (I’ve seen numbers a bit lower as well), we insist on measuring our own test capability. Given that reaching the lower end of that goal isn’t hard – divide requirements into tests, write and run the tests, and record the results – do we really need to know how our testing is doing? Let me give you a hint – it works about the same as everyone else’s testing. Do 3 cycles of testing and assume you’ll get about 75-80% of the defects out. Now go measure something that you don’t know much about, like the quality of the software coming into test.
5. It focuses you on the wrong thing. Guess what, testing will never be the best way to produce high quality code. It’s a supporting player, at best. But if you measure defect containment, you are basically admitting that you are reliant on your test capability to keep bugs out of production. The best thing you can do to keep bugs out of production is to never write them in the first place or catch them much earlier than testing where your odds of success are reduced. Get good quality code coming into test and the fact that you’ll only get 35% of the defects removed per test cycle won’t matter nearly as much.
I’m afraid DCR is probably not something I’m going to spend too much time thinking about any more, except perhaps to reiterate why we should be looking for better measures elsewhere.