Why root cause confuses

The term “root cause” seems to confuse software developers, and it seems to have been that way for a long time.  When a developer talks about “root cause” they tend to mean the place where the system started to go wrong. 

For example, if you call a webservice and it returns bad results that the caller might have detected there are two things you could do.  One, you could fix the caller to handle the error gracefully (often called defensive coding), or two you could fix the webservice to not return bad results.  Most developers I’ve run into will tell you that fixing the web service is “root cause.”  The idea being that if the web service didn’t do something bad, there’d be no need for defensive coding in the caller.

Fair enough, but this isn’t root cause.  Root cause has to go further, a lot further.  If root cause only goes so far as to fix the issue at the point of origination, then all you’ve done is fix one issue.  Instead, you hve to be asking the question: “why did we make that mistake in the first place?”  Was the webservice wrong because of a coding issue?  Requirements?  Design?

And further, why did you make a requirements, design or coding error?  What can be done to catch issues of this type in the future?  The bug you fixed is fixed, it isn’t going to be the next bug you deal with, but it is going to be one of a pattern of issues that you are not handling.

When you think about root cause, think beyond fixing the bug at the source.  That’s helpful, but it isn’t exactly the root cause of why the problem was introduced in the first place.  If you intend to stop future issues, you have to go further than just fixing the issue you have now.

Leave a Reply

Your email address will not be published. Required fields are marked *