What are you evaluating your model against?

I enjoy reading Kaiser Fung’s blogs (junkcharts and numbersruleyourworld). One entry in particular caught my attention because it was relevant to some work I had been doing recently.

We were working on a model to improve project estimation for an organization. In this situation, the given group was exerting a lot of effort to deliver an estimate. There are a lot of players involved in a project and each team wants a chance to weigh in on their part. In fact, Steve McConnell in his book Software Estimation: Demystifying the Black Art, he notes that the best person to give an estimate is the person doing the work. But because of the huge backlog of items needing estimation the organization was drowning in estimates and not getting anything done. They wanted to know if a model could improve the process.

So we collected some data, constructing a very simple model based on attempting to estimate the total effort of the project based on extrapolating from a single team’s input. We’ve had success with similar models elsewhere, so it seemed like a plausible route to go here.

How to evaluate a model then becomes the question. With a simple linear model like we were proposing, the first thing we’d look at is the R-squared. Ideally, if the input perfectly predicts the output, your R-sq will be 100%. But since models are not perfect, the R-sq is usually something less. In this case, the best model we had was 25%. The worst model we had resulted in a negative R-sq! You get a negative R-sq when the error in the model is bigger than the fit of the model. At this point using a model to help this organization out seemed hopeless. And that’s when Kaiser’s article popped to mind. We didn’t need a model that necessarily was a perfect model; we simply needed a model that was better than what they were doing today.

Although evaluating the model against various measures of goodness of fit, the real test was whether the model outperformed the alternative. In this case the alternative was expert judgement. We already knew that the effort to produce an estimate using a model was substantially cheaper than having a whole bunch of teams weigh in. So, could a really terrible model be better than the experts? It turns out the answer is yes. The model outperformed expert judgement about 60% of the time despite the poor quality of the model by other measures. One could hardly call the model good, but then again, it wasn’t like the experts were any good either.

We have this illusion that when we provide an estimate based on our “expertise” that it is somehow unbiased and of high quality. But referring back to McConnell’s book again, we know this to not be the case. There’s significant uncertainty in early stage estimates, so why should we presume that the expert can know something that hasn’t been articulated by anyone? They don’t know what they don’t know. And that’s where models can be helpful. Because the model doesn’t have any preconceived notions about the work and isn’t going to be optimistic for no reason, the model is likely to make as good an estimate as any expert would.

For some reason it reminds me of the saying about being chased by a bear. You don’t need to outrun the bear, just the other person the bear is chasing. And so it goes with models. The model doesn’t have to be perfect to be useful, it simply must be better than the alternative.

Leave a Reply

Your email address will not be published. Required fields are marked *