It’s not chaotic, it’s percent error

Generally, when we talk about correlation, people imagine a nice positive (or negative) relationship between the independent and dependent variable.  It typically looks something like the magenta data set below – for each increase in X, there’s a corresponding increase in Y, plus or minus some random fixed amount of noise.

In software development, we also see the other pattern on this graph – which still has the corresponding increase in Y as X increases, but we see that the random amount of noise seems to get bigger and bigger as X increases.  It results in this “spray” pattern which we tend to conclude means that things are getting out of control or that the relationship isn’t really there.  Otherwise we’d get a nice positive correlation right?

There is another explanation – percent error.  A positive correlation might be expressed Y = 2X + e, where e is some fixed error, let’s say +/-2.  So, if X = 10, Y should equal somewhere between 18 – 22 (2*10 +/- 2).  However, when you have a percentage error, e is some percentage of X.  When X is small, the absolute error of Y is small, and when X is large, the error of Y larger.  If X is 2, then +/- 10% is .2, but if X is 200, +/- 10% is 20.  One can quickly see how this would cause a spray-like pattern in your data.

It’s also a common pattern in software development, in my experience.  As you code, a certain percentage of the amount of code you create will be faulty.  It appears chaotic, and it does create a larger variance in the number of defects you’ll get as the project gets larger, but it’s actually not particularly surprising.

Knowing this, you can create models that help predict defects which use % faulty instead of count.  It’s still not going to make it easier to predict the absolute number of defects you’ll get, but it will at least help provide a realistic expectation that as project size increases (whether you measure it by function points, lines of code or something else) the number of defects you find may appear to vary more widely than a simple positive correlation would allow.

Leave a Reply

Your email address will not be published. Required fields are marked *