The problem with any approach to estimation

Despite all the times you may have tried, estimating an IT project always seems to be a losing proposition. Why can’t we seem to get it right? I’ve had the opportunity to try my hand at approaches to estimating a number of times, and I believe I’ve come upon one useful insight that any estimating process you define has to take into account – human behavior!

In IT, the majority of costs associated with any project are Iikely to be labor. Numerous strategies exist for trying to get at the “right” amount of labor. You can estimate by proxy, by analogy, top down, bottom up, wideband-delphi… You name it. (For a great resource, check out Steve McConnell’s “Software Estimation: Demystifying the Black Art.”) But I’m going to propose that no typical strategy here will help. The problem isn’t how you arrive at the amount of labor; it’s what happens afterwards.

Across a portfolio of projects, a good estimating process should be unbiased. That is, it should produce estimates which are equally likely to be overestimated as they are under-estimated. It’s true that any single project may experience an estimating error, but so long as the process is well centered, your business partners ought to be able to make reasonable assumptions about likely outcomes.

In one situation I worked on, we developed a simple regression model based on historical data to predict future project estimates. When the model was created using historical data, it performed really well. Even when we selected other random historical projects to test the model, it performed well. Everything would have seemed to indicate that the model was good and wasn’t overfit to the data used to create it.

However, a year after implementing the model, new real world projects that the model had predicted showed a bias towards being under estimated. What was happening? Was our process flawed? Yes and no.

If we were dealing with a process that didn’t involve humans, it’d probably have worked pretty well. However, I’m going to propose that because of humans involved, any estimate created, regardless of the process will have one of two characteristics. Either it’ll be biased towards under estimates or the estimates will be so outrageously large that nobody will want to do business with you. Here’s why…

When you estimate a project you create an effect called anchoring. By stating how much effort the project will take, your teams will align resources on the expectation of that effort. On some days/weeks/months during the life cycle of the project, individuals will be more or less busy. When they are more busy, they will book time equal to the time they worked. However, when they are less busy, because of the resource has been aligned to the project and likely has nothing else to do, they will also book hours to the project. In order to have an unbiased estimate versus actual outcomes, the light times must counterbalance the busy times. However, in order to get paid during the light times, the humans (this is where they mess it all up) still have to book time to your project. Thus, the offsetting light times never come to fruition and the estimate becomes biased towards always being under estimated.

The problem that follows gets even worse. If you use the actual outcomes from these biased results as feedback into your estimate process, it will cause an inflationary effect. Future estimates will be made larger to account for the appearance of under estimating and the process will repeat on subsequent projects. The result will spiral until IT becomes so expensive the business starts looking elsewhere.

It’s grim, but I believe there’s an answer, and it lies in how (frustrating as it may be) the business often treats IT already. Rather than making estimates larger in response to the data, you should adjust your estimating process to make them smaller! I know, this sounds crazy, but hear me out. Let’s say you are finding your projects are 10% under estimated on median. Adjust your process to make estimates smaller, let’s say by 5% and then review the results. If projects are still 10% under estimated, the under estimate you were seeing was likely the result of this waste effect. Continue to shrink your estimates until such time as the under estimating error starts to grow. At this point, you likely have squeezed out the bias caused by light times versus busy times and the under estimates you are now seeing are the result of actually allocating too little effort to the project. Simply undo the shrinking to get you back to your original 10% (or whatever it was) bias, and set your process there. Sure, you’ll always be a little biased towards underestimating projects, but it’s better than an ever bloating IT budget.

How to construct a good ROI

Far too often the proposed cost benefit analysis that I see aren’t worth the paper they are printed on. For the most part, this stems from the benefit side of the calculation and not the cost side. Although we do make poor estimates, getting better at estimating isn’t terribly difficult. Start by reading Steve McConnell’s “Software Estimation” and you’ll be well on your way.

On the benefit side, this is where things go haywire. Lets say we’re talking about the benefit of better code reviews. There’s lots of industry data that indicates code reviews are valuable when done well.

So the math in people’s heads might go something like… Better code reviews reduce defects. Lets assume a test defect is … I don’t know … worth $1000 each, and that we can cut defects by 75% by doing better code reviews and that a code review can be done in ten minutes. Even if the basic formula is right, all the inputs are wrong. Just like a computer program, garbage in, garbage out.

In order to do the benefits half of the equation you need some data to help you with your assumptions. These things you assume are likely knowable, or at least we can get in the right ballpark. Want to know what it costs to fix a defect? Do a brief time study exercise. Or, if you know the cost of a production defect (which for some reason we seem to often know) then use research like Roger Pressman’s to arrive at an approximate cost of finding the defect in the testing or coding phases. The number is probably closer to $500.

Next, look at what the industry data has on efficacy of code reviews. A 65% improvement is not unheard of, but assuming you’ll capture the entire benefit plus more right out of the gate is pure optimism. First off all, you might be doing some reviews today, which blunts your benefit because the potential gain is smaller. Secondly, you won’t be able to capture the entire potential benefit most likely. In one example I looked at, the difference in defect density between teams that did and didn’t do code reviews was 20%. So, if effective code reviews are 65% effective, the maximum opportunity was only 40%, not the proposed 75%. Worse, when buying third party tools or services, you can’t rely on the sales person to provide you good numbers. They have a vested interest in you buying the product and thus in making the ROI work.

And then on the ongoing cost side, it takes a lot longer than ten minutes to do a code review. All in all, code reviews are certainly worth it, but you won’t get this too good to be true benefit from them. In many cases, we have a solution in mind, but no idea how much benefit we might receive so we make up numbers. Sure, that fills out the required paperwork, but it really isn’t due diligence. We have an obligation to support our assumptions with some data (our own or external).

How much cheaper must it be?

How many times have you given an estimate only to have the business partner try and negotiate you down? In my own recollection, pretty much every time I’ve ever submitted an estimate there’s more push back. Now, that’s not to say my estimates are any better than anyone else’s or that my teams are more efficient. These were questions, at the time, that I didn’t think enough about to collect the data to answer.

But, the estimated cost of a project came up today. It was a huge project we were discussing, perhaps several million dollars in total spend. At some point the conversation turned to a small piece of the estimate. It was just ten thousand dollars or so, but we were discussing if it was the right number. Think about it… In the scheme of several million dollars, what’s ten thousand? It’s less than likely error in the estimate, that’s for sure.

Which is the point of my post. At what point do you know that the proposed reduction in the estimate is meaningful? If you do a point estimate, you probably don’t have any frame of reference. If you provide a best, likely and worst case estimate, however, you can begin imagining how you’d figure that out. If the changes made don’t bring the likely cost below the best case cost, you’re probably arguing about estimating error and not a meaningful difference in the scope or scale of the work.

From folks like Steve McConnell we know that developers are chronic under-estimators. Why then would you allow yourself to be pushed into an even smaller estimate, particularly when you know you were likely to come out worse than your likely case estimate anyway? If you’re going to revise your estimate downwards, make sure it’s for a meaningful change in the scope of the work, not just optimism or bullying on the part of the business. In the long run, you’re doing them no favors by caving in when you can’t reasonably deliver whatever it is for that cost. Now, figuring out how to be more efficient, that’s an entirely different topic.

I have $1000, can you build me a car?

In an odd conversation the other day, a friend proposed that since we aren’t good at estimating, that we shouldn’t bother to estimate at all.  Instead, we should time box the activity and whatever gets done is what gets done.  I agree that we’re often terrible at estimating, but I don’t think the alternative he was proposing was a good alternative.

When I say “I have $1000, can you build me a car?” one might naturally react by saying that’s impossible.  In Agile, my friend proposed, we wouldn’t bother to figure out all the requirements (after all, there are hundreds if not thousands of features that might go into a car) and instead, we’d say “let’s get started and see where we get.”

Now, you might put together some stories and assign them story points, but you wouldn’t know much more than the relative size of those stories.  And then you’d get to work and start developing features from those stories and you’d do that until the money ran out.  In the end, you might have a car… or not.  Hard to say, since we didn’t bother to even guess at the cost of a car.

Sure, that’s a silly example, but here’s a more realistic one.  I’ve got $1,000,000 and I want you to port all our software from language X to language Y.  We’ve never done it before, so we’re not really sure what it’ll take, so let’s get started.  Problem is, if you run out of money having ported 50% of the code, you really haven’t finished the project.  Nobody wants to spend a million to find out the work didn’t get done.

We place a lot of value on predictability.  In the same way we hate it when we take our car to the mechanic and they give us an estimate and then blow it, we don’t like when we set aside a large chunk of money only to have it not cover our costs.  For car repairs, at some point, you might decide (especially with an older car) that it isn’t worth continuing to put money into it.  Sure, maybe $250 is OK, but $1000 is too much.  Eventually there stops being a return on investment.

So, what’s a business to do if you say “hey, give me the money you have, and we’ll see what we can get done.”  At least for me, I wouldn’t invest my money there.  I’m willing to accept uncertainty, in fact I even like when my vet gives me a low/high estimate range for caring for my pet, but only within reason. 

Certainly not “how much have you got to spend?  Let’s see what we can do to your pet for that much.”  What if the money runs out mid-surgery?  Maybe I shouldn’t have embarked on the surgery in the first place.

Budgeting

If you’re a data person like me, you probably keep track of your finances fairly closely.  You also probably have a budget.  The thing I’ve noticed about my budget is that every month there always seems to be some line item that I’m over budget on.  Sometimes it’s dining out.  Sometimes it’s groceries.  Sometimes it’s something entirely different, but it’s always something.

There’s a couple reasons for this.  The first of which is that my software calculates an average monthly expense.  The problem is, assuming a normal distribution, that 50% of the time you are going to be above that average and 50% of the time below it.  For budgeting purposes, using the average isn’t a great idea since any given month there’s a pretty good chance you are going to be above it.  You probably should use average + 1 standard deviation or something like that, and then if you still have money left over you know that you’re on track.

What’s the big deal, you ask?  Well, even if I’m not necessarily spending more than I save, I may be saving less than I should be for things like my kid’s college, retirement, emergency funds, etc.

But there’s a second and more important issue.  Every month that I overspend my budget on some line item, I look to see what caused it.  I’m looking for something I can do to avoid the overage in future months.  It seems like most months I say to myself “oh, well that was an exception because my brother is only going to get married once, so I won’t have another large gift expense” or “well, my friends from California were in town, and they only come once a year, so no big deal that we went out and spent $200 on dinner.”  And yet, every month, there’s some item that seems to be some exception.

So when does an exception stop being an exception?  For my overall budget, probably some time ago when it became apparent that some line item was always going to be over budget.  It’s hardly a budget anymore if I keep blowing it (or at least not a very good one).

What’s this have to do with software?  The analogy applies to defects or your development budget and more. 

Do you find that every defect you look at seems to be a special case or “really hard to find” or “a rare case” or “once in a blue moon”?  When do you realize that these exceptions probably aren’t exceptions at all but part of a bigger pattern that while harder to solve is eroding the quality of your code?

Do you think that if you overran your budget in one month that something in the next month isn’t also going to come up?  Maybe it isn’t the same exact thing, but it’ll be something.  When do we stop denying the data and accept that what we want to believe is an exception is really a pattern?

You must estimate the work

Maybe this seems like an obvious thing to some people, but there are some things which aren’t estimating that we mistake for estimating. 

Example one, we apply a rule of thumb.  For example, the development team estimates 10 weeks of work and we apply a rule which says testing will be 50% of development, thus 5 weeks.  Now, whether the real work is 5 weeks or not becomes lost because we didn’t estimate.

Example two, we staff rather than estimate.  You’ve got a certain number of projects and a certain number of people, therefore we think all people must have a project to work on.  Not so.  Unless you estimate the work, you have no idea whether the number of people you put on the project is too many or two few.

Example three, we extrapolate.  We estimate 100 test cases, know the average cost is 1 hour per test case and therefore estimate 100 hours of work.  This is estimating, but it’s problematic in that we now only know how our total project cost is found from test cases.  That doesn’t mean all 100 hours will go to test case execution, some of it goes to management, some of it goes to data mining, some of it goes to writing up bugs, and so on.

Unless you set an expectation around the actual work to be performed (in LEAN, setting standard work times) then you lack any basis to evaluate whether you did it faster or slower.  You can’t draw a value stream (well, you can’t draw a value stream with data boxes on it) if you never get any real data around what each process step is taking.  And you can’t get any real data around the cost of each step unless you set yourself up to do that work in a reasonable amount of time.

If you’re just putting people in chairs, then Parkinson’s law is going to do a number on your productivity.