Getting back to writing

As you can see from my prior posts, I had taken quite the long leave of absence from writing – approximately 2 years.  In some ways, that’s good.  My interests have changed as my experiences have increased and I’ve moved away from process engineering into a more statistics heavy approach to my work.

That doesn’t mean I’ll never write about process again; indeed I think the purpose of using data is to decide on what choices to make (about process, policy, whatever) so I can’t imagine never touching on the subject again.  However, I want to refocus my efforts on the types of statistical errors I frequently see such that I help to make the world a bit smarter when it comes to thinking about data.  And at the same time, I hope to fill the gap between “Statistics for Dummies” (which is just a toe in the water) and the Andrew Gelman’s of the world (where almost everyone is in over their heads).

Why IKEA makes me question LEAN

We happen to live in a mid century styled house, and IKEA’s style often fits quite well into that design. While we own many vintage pieces, there’s something to be said for rounding out the collection with some newer stuff that can be bought cost effectively. IKEA is an interesting business in that they figured out that shipping air was wasteful. By flat packing their items and having customers assemble it themselves, everyone wins.

Now, if you’ve ever assembled your own furniture, you know it can be quite a project depending on what it is. IKEA uses all kinds of interesting fasteners, all designed such that you ought to be able to assemble it with a manual screwdriver, an included allen wrench and perhaps a small hammer. The thing is, I own a battery powered drill, and I use that (with interchangeable bits) in place of the Allen wrench and screwdriver. In theory, if I eliminated all the waste from the assembly process (say everything was packed such that the pieces came out in the exact order needed), that there wouldn’t be much advantage from using power tools. That is, when I was first taught about LEAN, we were warned against optimizing the value added parts of the process and to focus on eliminating the non value add.

So, if you buy from IKEA, then assembling it yourself is part of the value add. You specifically bought the product for this reason, to trade your time for less money. Therefore, optimizing it shouldn’t make much sense, and yet, using a power tool to do the value added work drastically cuts down on the time to assemble a piece.

In fact, power tools make many value added tasks much better (or even possible at all). And so it confuses me, if you are doing a value added task manually, why you wouldn’t focus on doing it as fast or efficiently as possible. Perhaps it’s just a miscommunication from the teachers to the students, and it is certainly true that there is many a wasteful step in many processes, but it strikes me that if technology can make a value added process better, than why would you not do that right now as well as eliminate waste?

Don’t apply software thinking to information gathering

One thing that software developers, architects, etc. strive for is not duplicating functionality across systems. The more systems you have in your ecosystem, the more complex it is to modify and use. Got two places where people can manage a workflow? Inevitably they will build a complex process where half of the business work is done in one workflow and half in the other. Unfortunately, while simplification like this is good for software and processes, it is not good for the general process of thinking.

To me, it’s confusing when organizations reduce or eliminate access to analytics services. Whether you’re a fan of Forrester, Gartner or some other service, it’s true that all sources of data (analyst firms, academic research, books, etc.) offer, at some level, duplicative information. One might be inclined to simply read one source and conclude the rest are duplicates. After all, duplication is bad in software and processes, so it must be bad in information as well, right?

Not so much. Edward Tufte called it ‘a diversity of evidence.’ The scientific method demands the experiments be repeatable by others, and in fact relies on such attempts to reproduce. So, it’s true that many sources provide similar kinds of nun formation, but the value is in the diversity. This is one place where you should seek to multiply, not reduce, the number of places you can take information in from. Don’t fall prey to the idea that just because simplification is good some places that it is universally good.

The problem with any approach to estimation

Despite all the times you may have tried, estimating an IT project always seems to be a losing proposition. Why can’t we seem to get it right? I’ve had the opportunity to try my hand at approaches to estimating a number of times, and I believe I’ve come upon one useful insight that any estimating process you define has to take into account – human behavior!

In IT, the majority of costs associated with any project are Iikely to be labor. Numerous strategies exist for trying to get at the “right” amount of labor. You can estimate by proxy, by analogy, top down, bottom up, wideband-delphi… You name it. (For a great resource, check out Steve McConnell’s “Software Estimation: Demystifying the Black Art.”) But I’m going to propose that no typical strategy here will help. The problem isn’t how you arrive at the amount of labor; it’s what happens afterwards.

Across a portfolio of projects, a good estimating process should be unbiased. That is, it should produce estimates which are equally likely to be overestimated as they are under-estimated. It’s true that any single project may experience an estimating error, but so long as the process is well centered, your business partners ought to be able to make reasonable assumptions about likely outcomes.

In one situation I worked on, we developed a simple regression model based on historical data to predict future project estimates. When the model was created using historical data, it performed really well. Even when we selected other random historical projects to test the model, it performed well. Everything would have seemed to indicate that the model was good and wasn’t overfit to the data used to create it.

However, a year after implementing the model, new real world projects that the model had predicted showed a bias towards being under estimated. What was happening? Was our process flawed? Yes and no.

If we were dealing with a process that didn’t involve humans, it’d probably have worked pretty well. However, I’m going to propose that because of humans involved, any estimate created, regardless of the process will have one of two characteristics. Either it’ll be biased towards under estimates or the estimates will be so outrageously large that nobody will want to do business with you. Here’s why…

When you estimate a project you create an effect called anchoring. By stating how much effort the project will take, your teams will align resources on the expectation of that effort. On some days/weeks/months during the life cycle of the project, individuals will be more or less busy. When they are more busy, they will book time equal to the time they worked. However, when they are less busy, because of the resource has been aligned to the project and likely has nothing else to do, they will also book hours to the project. In order to have an unbiased estimate versus actual outcomes, the light times must counterbalance the busy times. However, in order to get paid during the light times, the humans (this is where they mess it all up) still have to book time to your project. Thus, the offsetting light times never come to fruition and the estimate becomes biased towards always being under estimated.

The problem that follows gets even worse. If you use the actual outcomes from these biased results as feedback into your estimate process, it will cause an inflationary effect. Future estimates will be made larger to account for the appearance of under estimating and the process will repeat on subsequent projects. The result will spiral until IT becomes so expensive the business starts looking elsewhere.

It’s grim, but I believe there’s an answer, and it lies in how (frustrating as it may be) the business often treats IT already. Rather than making estimates larger in response to the data, you should adjust your estimating process to make them smaller! I know, this sounds crazy, but hear me out. Let’s say you are finding your projects are 10% under estimated on median. Adjust your process to make estimates smaller, let’s say by 5% and then review the results. If projects are still 10% under estimated, the under estimate you were seeing was likely the result of this waste effect. Continue to shrink your estimates until such time as the under estimating error starts to grow. At this point, you likely have squeezed out the bias caused by light times versus busy times and the under estimates you are now seeing are the result of actually allocating too little effort to the project. Simply undo the shrinking to get you back to your original 10% (or whatever it was) bias, and set your process there. Sure, you’ll always be a little biased towards underestimating projects, but it’s better than an ever bloating IT budget.

This one graphic DOESN’T explain it all

How many times have you been reading Facebook, or your favorite blog or a site like buzzfeed and you see an entry with the title like ‘this one simple graphic explains [insert topic here] once and for all’ or something like that.

These titles suggest to the reader that if you just looked at some problem in a specific way that suddenly it’d all become clear. Of course, the next step is that your democrat or republican friends post these items to Facebook with a helpful comment like “for my [opposing party] friends.” And really, nobody’s mind is changed.

First off, I’m not going to spend much time addressing cognitive dissonance. Reality is, giving someone evidence that isn’t in line with their world views tends to lead to a strengthening of their current views, not a weakening of it.

But secondarily, for any sufficiently complicated topic (which in my world, is pretty much all of them), there is no one graphic that explains it all. And I suspect most of your situations are like that as well. Let me use an example, organizational productivity. We measure effort per function point per industry norms. And we were demonstrating in our “one chart that explains it all” that productivity had improved since a recent change. Except one chart won’t do it. The chart makes the main point, but then we had at least five other charts checking things like the measurement system hadn’t been tampered with, that quality hadn’t suffered as apparent productivity rose, and so on. In IT, most of the things we measure are proxy measures for some outcome we care about. As proxy measures, we always have to worry about the measurement system and unintended consequences of our choices. As a result, no analysis is ever complete on a single chart.

Treat anyone and anything that is explained to you in “one simple chart” with suspicion. If it seems too simple and obvious, it probably is.

The “right” way for the business, but the “wrong” way for us

How many times have you been annoyed by a business partner who tells you exactly how something should be done. Let’s say you’re building a new application and it needs data from a third party. The third party provider has a web service you can call to get a realtime result. However, another application in your organization is already getting that data periodically as well. The business person begins to question your design (and the associated cost of a new call to the webservice) when wouldn’t it be better just to take a nightly feed from the other system and load it? For oh so many reasons, we know this isn’t the right thing to do. The data is likely outdated since it’s now 24 hours behind the realtime version. It ties two unrelated systems, who happen to need similar data, together. Now if the old system changes, you’ll have to modify and retest this new system as well.

If, as developers, we were purists about doing the right thing, it’d be a defensible stance. But how often have we kludged something together ourselves? Take for instance, data we need about our organization. It might come from a Project-and-Portfolio Management tool. Perhaps we want to look at data by “team,” but for whatever reason the team data isn’t available in the tool. Do we try and do the right thing and fix it? Not usually. Just the other day I encountered a data extract, then joined via Microsoft Excel and a vlookup() to a list of people which mapped them to a team. Why was the team data not in the tool? That would’ve been difficult, but the right thing to do. Now, instead of a clear single system of record for team membership, both the PPM tool’s org structure and this one-off would have to be maintained. And while we are often hesitant to make compromises about the stuff we build for the real customers, we will accept horrid messes that create work for ourselves. Why should we be unwilling to have messy solutions imposed on us but willing to impose them on ourselves?

I don’t think we can hold others to higher standards than we are willing to hold ourselves as developers. Don’t create your own messes.

It’s not a scorecard if there is no score

Since it’s the World Cup right now, it makes sense to spend some time talking about keeping score. I’m from the US, so I took note of the US beating Ghana 2 goals to 1 recently. That’s the great thing about keeping score, you actually know who came out on top.

We use scorecards (or something equivalent) widely in professional sports. In baseball, for example, each player has a set of statistics associated with their performance – batting average, ERA, RBI, and so on. And using these statistics we have some clue as to which players are better than others. That doesn’t mean we can exactly discern two great players, but we can get them into a general stratification and figure out who to keep and who to cut.

In software we often desire to have scorecards, but we fail to use them appropriately. Perhaps we consider three things important about every project – on time, on budget, and with decent quality. Sure, it’s a simple model, but perhaps better than nothing.

So we go and collect this data on projects and we notice one project who, based on our data appears to be late, over budget and below average quality. So we go ask the team what’s going on… And wouldn’t you know it, they’ve got a reason for everything. The scope changed, so they can’t be held to the date, and they couldn’t get the resources they really needed for the project, so people are working overtime, and well, you know, if you don’t have top people then the quality will suffer. Clearly this project’s scorecard says it is in trouble, but the team wants to call it all clear.

The outcome of software projects is often like the outcome of a sporting event. The project will either eventually succeed or fail, just as will one of the teams. Some times, for both groups, the getting there isn’t pretty. In the end, the winning sports team may say things like “we won, but we could’ve played better” which demonstrates a far better understanding of probability of success than most project teams have. Instead, the project team wants to put out its own version of reality, one that is inconsistent with how all other projects are looked at. Suddenly you don’t have a scorecard anymore. Who won the game when the score reported is 2 to Purple-People-Eater? You haven’t a clue.

I’m not proposing that measurement in software is an easy task, but if
we choose measures that act as a proxy for a project or organization’s performance, we must use a consistent ruler to measure them all. Otherwise, you don’t have a scorecard, you just have, as Kaiser Fung likes to call it, story time.

Why value delivered is a misleading metric

A recent conversation I was having turned to the idea of standing development teams and measuring performance in terms of value delivered.

There’s been some debate back and forth between colleagues and myself about whether value is something that IT should measure. After all, it commingles elements IT can’t control (the goodness of the business idea) with their delivery. But, that aside, let’s assume we ought to measure IT including value in some way.

The idea of measuring value delivered seems to make sense for fixed sized teams. Under typical circumstances you always want to account for opportunity in your measures. For example, if you were comparing two teams, one of 10 people and the other of 100, you’d intuit that 100 people are going to turn out more value than 10 simply by virtue of being an order of magnitude larger. However, if your team size is fixed, then adjusting for team size doesn’t matter… You’d be taking value delivered and dividing it by the same number every time.

However, there still is a problem with just measuring value delivered. Let’s say you are chugging along delivering business value and the business says “we’d like even more value! Can you hire another person?” Your team size would still be reasonable, so you say yes. Indeed, with the new person on board, value delivery rises, so everyone’s happy right? Well, not necessarily. Adding the person added value, so your line slope of value delivered would rise, but did it add enough value to overcome the additional cost? Well, that depends. If a team of 10 was delivering $1m in value then a team of 11 ought to deliver at least $1.1m in value in order for the proportion of value delivered to stay constant. On the flip side, if your team shrinks from 10 to 9, you’d expect value delivered to drop some as well. In fact, if it dropped from $1m to $950k, the proportion of value delivered to team size would actually increase! And by the way, if the value delivered didn’t drop with fewer people on the team, what does that say about the contributions of the former member(s)? When your business folks say to you “IT costs too much” what they are perceiving is the value they’re getting for the cost, not just the value and despite the statement, not just the cost either.

Of course, if you only measure the value half and not the costs half (whether that’s in people on the team or hours billed or whatever) then you’ll never know this. Capers Jones in his research has pointed to evidence that larger projects experience lower productivity, or in essence, smaller amounts of value delivered per unit of effort exerted. The idea of simply attempting to maximize value delivered underpins the mythical man month that Fred Brooks wrote about so many years ago – the incorrect belief that if ten people are good, then twenty must be better. To know the optimal mix for your organization, then you must attempt to measure productivity, not just the value half.

The problem with value delivered is that it’s a useful measure only as long as the team stays fixed in size in perpetuity. That’s unrealistic. If the team size changes for any reason (hiring, quitting, leave of absence), even temporarily, you must account for the rate of value delivered and not just the sum total value. The BLS reports that median tenure of an employee is just around five years, so you ought to expect some instability in your team over time. Otherwise, when someone comes knocking saying you aren’t delivering the value they expect, you’ll have no basis for the conversation.

What are you evaluating your model against?

I enjoy reading Kaiser Fung’s blogs (junkcharts and numbersruleyourworld). One entry in particular caught my attention because it was relevant to some work I had been doing recently.

We were working on a model to improve project estimation for an organization. In this situation, the given group was exerting a lot of effort to deliver an estimate. There are a lot of players involved in a project and each team wants a chance to weigh in on their part. In fact, Steve McConnell in his book Software Estimation: Demystifying the Black Art, he notes that the best person to give an estimate is the person doing the work. But because of the huge backlog of items needing estimation the organization was drowning in estimates and not getting anything done. They wanted to know if a model could improve the process.

So we collected some data, constructing a very simple model based on attempting to estimate the total effort of the project based on extrapolating from a single team’s input. We’ve had success with similar models elsewhere, so it seemed like a plausible route to go here.

How to evaluate a model then becomes the question. With a simple linear model like we were proposing, the first thing we’d look at is the R-squared. Ideally, if the input perfectly predicts the output, your R-sq will be 100%. But since models are not perfect, the R-sq is usually something less. In this case, the best model we had was 25%. The worst model we had resulted in a negative R-sq! You get a negative R-sq when the error in the model is bigger than the fit of the model. At this point using a model to help this organization out seemed hopeless. And that’s when Kaiser’s article popped to mind. We didn’t need a model that necessarily was a perfect model; we simply needed a model that was better than what they were doing today.

Although evaluating the model against various measures of goodness of fit, the real test was whether the model outperformed the alternative. In this case the alternative was expert judgement. We already knew that the effort to produce an estimate using a model was substantially cheaper than having a whole bunch of teams weigh in. So, could a really terrible model be better than the experts? It turns out the answer is yes. The model outperformed expert judgement about 60% of the time despite the poor quality of the model by other measures. One could hardly call the model good, but then again, it wasn’t like the experts were any good either.

We have this illusion that when we provide an estimate based on our “expertise” that it is somehow unbiased and of high quality. But referring back to McConnell’s book again, we know this to not be the case. There’s significant uncertainty in early stage estimates, so why should we presume that the expert can know something that hasn’t been articulated by anyone? They don’t know what they don’t know. And that’s where models can be helpful. Because the model doesn’t have any preconceived notions about the work and isn’t going to be optimistic for no reason, the model is likely to make as good an estimate as any expert would.

For some reason it reminds me of the saying about being chased by a bear. You don’t need to outrun the bear, just the other person the bear is chasing. And so it goes with models. The model doesn’t have to be perfect to be useful, it simply must be better than the alternative.

Following fast

Some time ago, unfortunately I can’t cite the source, I read that one of Toyota’s major strategies was not necessarily to innovate but to be able to follow quickly in the footsteps of innovators. The point was, whether or not correctly attributed, innovation is overrated if you can copy someone else’s innovation quickly enough.

Well today I saw this article on Huffington Post… A sort of rant about why 2048 was a rip off of “threes!” and how you should go out and buy Threes! because they put effort into their game and the ripoff only spent a couple weeks being constructed.

I won’t comment on the justness component of the article, but I do think that it’s an excellent illustration of following fast. 2048 is indeed very similar to its predecessor. I’ve played both since my father-in-law first challenged me to beat his top score on Threes! and subsequently pointed me to the equally addicting and free game, 2048. However, 2048 does have some differences and for the casual game player it’s far less frustrating. So, should every company invest 14 months inventing a game, or is a two week knockoff done by a single person and given away adequate?

Particularly in game playing, the marketplace is a highly commoditized place. If you need an entertainment fix, almost any game will do, so while each game is unique, you’re not just competing against other games like yours, you’re competing against all other games which satisfy the same need.

For better or worse the free market is pretty indifferent towards fairness, so recognizing that and following fast may be the way to go. And oh, it has another advantage, one that was articulated by Tom Demarco in his work “The Deadline.” If you have an existing product to copy, specifications for how your product should at least mostly work are right there in the form of the existing software and user manuals.