If you’ve ever done regression analysis, then you’re probably familiar with some of the diagnostic plots that you get out of the work. If you’re not, I’d encourage you to read up on regression analysis, because there’s far more to it than just getting an r-squared or even r-squared (adjusted). You need to examine the diagnostic plots of the residuals to understand if the model is decent. And one of the most frequently used diagnostic plots is the normality plot of the residuals.
The assumption regarding the residuals is that they are normally distributed around a mean of zero. But sometimes the residuals wander away, and then what do you do? There are lots of ways that this could happen, but I’ll show you two – fat tails and short tails.
Short tails looks like this on the normality plot:
Notice the distinct “S” shape to the residuals. It’s also an easy mnemonic device – “S” shape = short tails. Short tails indicate that the data is more tightly packed around the mean than a normal distribution would expect. In the simple example I’ve created above, one way to get short tails is to have 2 populations with the same mean but different variances. For example, one has a mean of 0 and standard deviation of 1, while the other has a mean of 0 and standard deviation of 0.5. The other possibility is that you have a single population whose error varies more (or less) as the independent variable increases. In either case, there’s more data close to the mean than you’d expect.
The other option is fat tails, and looks like this:
I guess you could call it the opposite of the “S” shape that you see in short tails. Fat tails occur when you have a missing explanatory variable that defines two different levels for the dependent variable. For example, you might have two populations, one with a mean of 0 and standard deviation of 0.5 and the other with a mean of 1 and standard deviation of 0.5. Notice that in this case it’s the means that are different instead of the standard deviation.
The great thing about fat tails is you can go looking for another explanatory variable to correct this error. For short tails, I’m not certain there’s a lot you can do, but I have to admit I’m no doctorate in statistics. You may be able to calculate a percent error instead of an absolute error if the issue is increasing variance as the independent variable increases, but there may be other corrections that I’m unaware of.
What’s important to note is that even if the residual plots aren’t what you desired, they can still help you learn about what’s going on and then you can use that information to improve your software processes. We learn when we fail, so rather than shrug and give up on a failed model, see what you can take away from it. Fat and short tails are at least two things to get you started.