That’s not root cause

Far too often I read some email or paper explaining in detail the “root cause” of an issue as something along the lines of “in module X when calling function Z, if a null is passed in variable Q then blah, blah, blah will happen. Root cause is to fix the module to test for a null in variable Q and…”

This is not a root cause. I know why so many people think it is. When we go to fix a bug we can fix it the right way or the wrong way. The wrong way, perhaps, is to fix it not at the source but to devise some sort of work around. These things are often sloppy, and by most developers standards, highly undesirable. To a developer, a root cause simply means where the problem begins. And, of course, this is a bit gray as well. If function X calls function Y, passing an invalid variable, which function is at fault. If you are a proponent of defensive programming, you’d argue that function Y is at fault for not checking the inputs. If you are a proponent of design by contract, then you’d argue function X should never have called function Y with an invalid value.

Frankly, it doesn’t matter to me because neither are the root cause. The real root cause is the reason you made the mistake that you made in the first place. Why did the code ever get written that X could call Y with a bad variable or that Y wasn’t checking its inputs? Why did we make these design decisions? Why did we make these coding choices? Why was it built that way? When you start asking and answering these questions, instead of talking about where in the code a fix is needed, then you can start getting somewhere. If you stick to the “where in the code was a fix needed” then you can never do anything with the information. You won’t make that exact same mistake in the exact same line of code – because you just removed that bug. But where in the code you make the fix won’t help you figure out how to prevent the next mistake.

Leave a Reply

Your email address will not be published. Required fields are marked *