What can a snowblower tell us about software?

If you’re in the north eastern United States, you’re probably thinking about snow right now. And if you’re responsible for clearing the snow from your drive or walkways you might also be all too familiar with time behind a snow blower. For years I hand shoveled my walkways, but when we moved to this new house they were simply far too long for that.

It takes me about an hour to do all the clearing I am responsible for, so that’s a lot of time to think, which isn’t necessarily a bad thing. This particular snow is the deepest we’ve had yet. My snowblower has three forward speeds on it and presumably you use a slower speed when you have more snow to clear. The slower speed allows the auger to clear the snow before it gets all backed up.

So, as I was clearing the drive, I noticed something. Even at the lowest speed there was enough snow that some of it was being simply pushed out of the way by the blower. That meant that I’d have to do clean-up passes just to get that little bit of snow that the blower wouldn’t get on the first pass. And that got me to thinking. What if I just went faster? After all, if I was going to have to make a second pass anyway, who cares if it’s a tiny bit of snow or a little bit more?

And that got me to thinking about software. One approach might be to take it slow and carefully, but if you’re going to create bugs anyway, then perhaps going that slow isn’t the right answer. You’re still going to need the clean-up pass so you might as well let it happen and just clean up a bit more, right?

That sort of makes sense, if you think a second pass over the code is as effective as a second pass with the snow blower. In terms of dealing with snow, the blower is relentless. If it goes over the same ground twice it will do so with the same vigor as before. On the other hand, testing is imperfect. Each pass only gets about 35-50% of the defects (tending towards the 35% end). It isn’t like a snow blower at all. If you push aside a relatively big pile of snow with the a snow blower, it’ll get it on the second go. If you create a big pile of bugs in the code on your first go, a round of testing will likely reduce the pile by less than half. Then you need another pass, and another just to get to an industry average 85%.

There’s one other thing about going too fast that I learned with my snow blower. Some times you get a catastrophic failure. In my case, going too fast with the snow blower broke the shear pin on the auger. It’s a safety feature to prevent damage to the engine but it does make it impossible to keep using it to move snow. And software is a bit like that too. Go too fast and you may introduce a major issue that you’ll spend a whole lot of time cleaning up. It is not all about speed.

Scrum’s 50% failure rate?

So in a class today I was flipping through the materials and saw this:

20131210-222420.jpg

What does one make of such a statement? If this statement is true, then using Scrum as a methodology isn’t better than flipping a coin. Half the time it works, the other half it doesn’t. If your methodology choice is functionally equivalent to coin flipping, then the methodology necessarily doesn’t add value. Now sure, you could argue that if you’re on the failing side of the equation that you’re “doing it wrong”, but some consideration should be given to the idea that choosing a methodology (any methodology) is no predictor of success. All that said, even one of the original signers of the Agile manifesto is still obligated to produce data to show this is so. It’s an incredibly broad generalization.

The other thing is that the second part of his statement is far less specific. “Bad Scrum” is a “major cause”? What does major mean in this case? 50%? 25%? Something else? If half your projects are failing is resolving the bad scrum whatever that may mean going to make all of them not fail? Unlikely. We can reasonably assume that regardless of how well you do a process that under some circumstances it will fail, so how far will fixing bad scrum take us. It’s very hard to say from this statement.

From a trusting there’s analysis underlying these statements, I’d by far rather hear something like “in a recent study of N projects, 50% failed [how did they fail?]. Of the failures, N% can be attributed to ‘bad scrum’.” Sure, it doesn’t read like a nice little sound bite you can put onto a slide in a training deck, but it’s far more complete and far more useful to the reader to understand what the opportunity is for fixing the problem.

No True Scotsman

I was recently enjoying the Illustrated Book of Bad Arguments when I came across the logical fallacy called “No True Scotsman.” It immediately reminded me of a discussion I was having with a couple individuals on software development methodologies. Up until recently, a team had been pushing their development methodology as Agile development. That was, until interest in Scaled Agile Framework had emerged. I can’t comment on the goodness of SAFe, but we had lots of data available from this group on the efficacy of their form of Agile.

The discussion proceeded something like this

Them: “We’d like to start using SAFe, can you help us establish a measurement system to support our decision.”
Me: “Sure, I’m always willing to experiment and learn, but let’s approach this with some skepticism. Our past data on our Agile projects indicates no evidence of statistical differences in quality or productivity.”
Them: “What we were doing wasn’t true Agile.”
Me: <stunned silence>

This is a great example, and interesting use, of the No True Scotsman argument. When we lacked data about what we were doing, it was Agile, but the minute we had data that it may have not been that beneficial, suddenly it wasn’t really Agile anymore. The conversation forced me to do some digging into the scholarly research on Agile methods. There’s a dearth of good research here. Jones offers some analysis in his article here: http://www.infoq.com/articles/evaluating-agile-software-methodologies. I like Jones a lot, being one of the few researchers to take up potentially unpopular positions. He is in the consulting business, so he doesn’t give away his data set which for academics is challenging. There’s also this systematic review of Agile from 2008 (ancient in IT terms, I know) which concludes there’s little strong evidence for Agile methods in the available research. But, of course, if you’re willing to adopt the No True Scotsman argument, that doesn’t matter. After all, whatever gets studied won’t really be a good example of it anyway. ūüėČ

The downside of delaying commitment

The Poppendieck’s helped push the idea of delayed commitment in software development. The idea is relatively straightforward and on the surface seems good. If you don’t need to make a choice, don’t. In terms of LEAN thinking, if you consider a decision in software development to be akin to creating inventory in manufacturing, then this makes perfect sense. Any decision made too early has the potential of becoming invalid with time. In the same manner, a part made in a factory too soon runs the risk of becoming useless.

We’re in the process of buying a home, and one of the very often delayed commitments is to an insurer. In the USA, and presumably elsewhere, if you want to get a mortgage, the lien holder wants you to insure your property. After all, without the collateral of your property, the lien isn’t worth very much. So, it’s very much in their interest to assure that if something bad happens that it will get repaired. You don’t need the binder from your insurer until within days of closing. You’ve probably already done the inspection, paid for the appraisal, perhaps paid money to lock the interest rate, etc. The seller will have turned away other buyers. After a month and a half, everyone is pretty well locked into this transaction. Tens of thousands of dollars may be at risk.

But that’s not my personality. If I can get the data to make a choice and make one, I will. And getting a quote on insurance was something I could easily do well in advance of closing. Delayed commitment? No, far from it. It’s usually just a formality to request the quote and lock in the insurance, but things have changed. Higher losses due to delayed maintenance and more extreme weather have made insurers more cautious. When I called, the first insurer refused to underwrite the risk because of the age of the roof. So did the second insurer. Had I waited to get these quotes then I would have found this out within days of the closing. No insurance, no closing. We’d be heavily committed to this house with a strong possibility that I would have lost all the money I put down.

At this stage, all I’m risking is the cost of the inspection. Sure, I don’t want to lose $500, but I’d far rather lose that than 5% of the purchase price. At this stage my goal is to renegotiate with the seller regarding the roof. My negotiating position is strong – I’m only into it for a few hundred while the seller now faces the possibility of being completely unable to sell without replacing the roof. Any buyer who isn’t paying cash faces the same problem. They won’t be able to get a mortgage without insurance and insurers won’t underwrite the house in the current condition.

Had I waited, my negotiating position would have been incredibly weak. I’d have put out perhaps $30,000 between inspection, appraisal, legal fees and earnest money. I’d be forced to pay for the roof myself or lose all that money. Delaying commitment isn’t always the safest choice. No matter how routine the delayed decision has been in the past, if you don’t understand the risks of not knowing the answer, you put yourself in a precarious position.

I’d argue that for anything you can know, don’t delay knowing it. Some minor rework costs are more predictable and preferable to the occasional complete blowout loss. There is value in predictability, even if the costs are slightly predictably higher.

Update: Interestingly, the seller chose not to negotiate over the replacement of the roof. We went on to buy another house, since we were unwilling to assume all the risk. The seller did eventually sell, and not terribly long thereafter, but at $15-20,000 less than our offer. In the sellers case, delaying the decision to negotiate cost them as well. Sometimes what you have right before you is as good as it is going to get.

The software prediction “myth”

I got told today that software is unpredictable. That the idea of planning a project is ridiculous because software is inherently unpredictable. Unfortunately, I think the comment stemmed from a misunderstanding of what it means to be predictable.

If you smoke cigarettes your whole life, odds are that you will end up with cancer, heart issues or some other horrid disease. Now, there are people who smoke their entire lives and don’t have any significant ill effects. They die from something else first. And yet, although those people exist, we can say with some certainty that for all those who do end up with some smoking related disease that it was ‘predictable.’ In the same manner, it’s predictable that if you shoot yourself in the head with a gun that you will die, and yet people live from time to time after having done exactly that.

Secondly, predictable doesn’t necessarily mean superbly accurate. Weathermen predict the weather and barely ever get it exactly right. But it turns out over the last decade or so that their accuracy has gone way up. They still get things wrong, but compared to the distant past, it’s still a reasonable prediction. In fact, some research I’ve seen would put stations like the weather channel, for example, at 80% or more accurate (within three degrees of the estimated temperature) over the long run.

To say software isn’t predictable implies that all outcomes are completely random, and yet we know that isn’t the case at all. Even the most diehard agilista will support unit testing of some form because the outcome of doing unit testing is predictable. You get better quality code. Fair coins, dice and the lottery are unpredictable (and to be fair, there have even been lab studies to show that flipping a coin can be predictable if you control enough of the variables.)

If we want to seek to improve our predictions, which is a separate issue from whether software is predictable or not, we have to study the factors and outcomes of projects to establish what matters. But software is predictable; don’t let anyone tell you otherwise.

A diversity of evidence

How many times have you read conflicting research? Eggs are good for you. Eggs are bad for you. A glass of wine is good for your heart. Alcohol is bad for your heart. Some times it makes you wonder what the heck scientists are doing. How could it be that they flip flop so often? Don’t they know what they’re doing?

In fact they do. That’s the way science works. At any given time one or more scientists is studying some hypothesis. To do so, they must select a research design, measurement system, contend with random chance and countless biases. So, if you pick up any given study, it’s likely to show some result… But which result exactly?

Take paired programming, for example. If you search for research on it, you’ll find studies which indicate positive outcomes, studies that indicate negative outcomes and studies which indicate no detectable effects. Which one of these studies should you believe?

Well, in fact, you shouldn’t believe any single study ever. Science doesn’t work that way. Science relies on a diversity of evidence. We expect that other scientists will attempt to duplicate our findings under different experimental conditions. Then, we can look at many experiments attempting to assess the same effect and ascertain whether one study was a fluke or not. You can’t simply pick the study you like that matches your world view. Pretty much anyone can find a study which suits their pet theory. That isn’t the point. We must seek a diversity of evidence to determine if what we are seeing is a true effect or just a fluke of a single study. Once we see many studies we can determine what the likely effect is by determining a mean effect size and utilizing funnel plots to determine if the available evidence is skewed by publication bias.

What does it mean to fail fast?

What does it mean to fail fast? The interpretation I most often hear is to get to coding so that we can test out the idea/design/architecture and understand if it is going to work. We don’t want to find out during final integration test that the risk we were worried about is going to manifest itself.

At any rate, a group of us got around to discussing a common problem with integrating Agile concepts into the corporate environment. The issue was that the business wants certainty, and isn’t prepared to simply put a bunch of people in a room together for a fixed time period and take what comes out. They have business goals to attain and want to know whether the high level capabilities they have outlined are going to be delivered according to some agreed upon date. Now, maybe at this point you’re saying to yourself “if you’ve got clearly articulated business goals, why are you doing anything like Agile in the first place?”. Fair enough. We often choose Agile because it’s the current hot trend, not because we actually meet the criteria which would make the project ideal for Agile. But, I’ve also heard other proponents say Agile is always the right choice.

So, let’s assume this generous definition. You’ve got a set of high level requirements that you can turn into stories, you can create a backlog and as the project progresses the business starts to get frustrated because the estimated final cost or schedule keeps rising. And when you look at it, you realize that as you delve into each story, you learn so much more and are forced to re-plan because it’s so much bigger than you thought.

The question I pose to you is… Is this failing fast? I’d argue that it is not, because in terms of succeeding at the cost or schedule of this project, we had an early opportunity to explore the stories a bit and we chose not to. We may have had the ability to fail fast on the features in the first couple iterations, but anything in the backlog further out in time is decidedly not going to fail fast because we won’t look at it until a lot of time passes.

Fast failure isn’t just for technical challenges. To maximize business value it may make sense to spend some time exploring the known stories a bit to see if we can flush out anything that would cause us to fail much later. Particularly on a project of substantial size, there are going to be must have features beyond the first couple iterations, and you don’t want to be six iterations in when you find out your emergent design has to be significantly reworked when you could’ve known that much, much sooner.

Is it Agile, or is it just correlation?

Something odd occurred to me today after hearing a strikingly similar story for the third time now. The story goes something like ‘we had a major project fail; people were let go as a result and the business didn’t trust the software development organization. Then, we made another attempt at the project, this time using Agile and it worked! The business loves us again! We will never go back to waterfall!’

Naturally, we hear the correlation (agile = success ) and assume causation. “We switched to Agile and it worked.” It is seemingly proof for Agile. We try project X, it fails and then we try project X again but this time with Agile and it works. Therefore Agile causes project success. But that thinking leaves out another possible explanation. You tried project X and it failed, but you learned a ton from the failure, whether you realized it or not. When you tried the project again you now knew what you were doing (because you’ve done it once before) and it worked.

In typical software projects you often don’t know what you’re doing because you haven’t solved the problem before. Otherwise, we wouldn’t bother to develop software… We’d just copy it. But when you reattempt a failed project, you know all kinds of things you didn’t before. You’ve got more realistic estimates – because you probably blew all your original ones. You’ve got some of the key algorithms down and you’ve seen one “solution” to the problem that didn’t work. It’s just like when you finish coding something that does work. You can immediately see how, even though it works, you could make it better. Completed stuff creates clarity for us… But doing every project twice isn’t really an option, so if the project succeeds we don’t redo the code. That’s how we end up with technical debt, after all.

Now there may be cases where from scratch projects succeed using Agile. It’s this particular story of “fail and then try the same thing using Agile” that raises red flags for me. Far more changed here than just the way work is broken down and delivered. Whether you reuse any of the code or not, a second attempt in software is a huge head start. Keep it in mind before you attribute the success of the second attempt purely to the methodology.

Richness versus Recall

Alistair Cockburn presents an interesting insight in his presentation “I come to bury Agile, not praise it.”¬† On slide 12, he presents the richness of the communication channel as an important part to getting information across.¬† Surely, you’ve experienced this yourself with a never ending chain of back and forth emails that were quickly resolved with a single 1 minute phone call to clarify.

Therefore, it makes enormous sense to replace communication of low richness with communication of high richness, right?¬† Well, I’m not sure it’s that black and white.¬† In order to use information effectively, you not only have to be able to communicate it, but also to recall it when you need to use it again.

For example, you sit down and have a conversation with the user and then turn around to write some code.  The ability to translate what the user asked for into code depends not only on having the conversation but remembering all the details of the conversation correctly.

So, do you have an Eidetic memory?  Probably not.  How long can you accurately recall a conversation?  Long enough to turn it into code faithfully?  Probably not as well.  You can probably remember the nominal case, but what about all the exception handling you discussed?

Now, I’m not saying you should communicate via email or paper only since that’s clearly silly, but on the other extreme, you probably shouldn’t communicate orally only as well.¬† Indeed, merging the face to face conversation with documentation helps manage both the completeness of conversation and ability to recollect details when you need it.

Velocity and standard work

LEAN has a concept called standard work times¬†which is an expectation of how long some task should take.¬† For example, it should take you 1 minute to mill part X from a block of steel.¬† Having expectations of how long something should take is important in manufacturing, because it helps you assess whether an individual or team is meeting those goals.¬† The measure for standard work shouldn’t come from some desire for a certain output, it should be based on how long those things really take.

In software, lots of measures of standard work have been generated over time, but we don’t seem to use them much.¬† I recall, from my earliest days of programming, that a programmer was expected to create about 5000 lines of code per year.¬† I’ve also heard estimates for debugged lines of code per day¬†(10-25 or so, if¬†I recall correctly).¬† Capers¬†Jones’ expressed measures of productivity in FP¬†delivered per day.¬† We seem to have some understanding of what a typical productivity expectation is.

And then we ignore it.¬† For some crazy reason we decide that¬†the “laws of physics”¬†don’t apply to us and we need a measurement that is¬†unique just to us.¬† In Agile, they call it velocity.¬† It’s a measure of the team’s productivity based on story points per day.¬†¬†What’s a story point?¬† Well, that’s a relative sizing mechanism the team determined.

The problems with velocity are numerous:

  1. It’s only good for the exact¬†team¬†you’re currently on.¬† Sadly, team members come and go.¬† As anyone can tell you, people are not protect-able resources.¬† Even if you don’t want them to quit, they will.¬† And when they do quit, your measure of productivity walks out the door with them.¬† The new person will need ramp up time.¬† The new person may be more or less productive.¬† The new person might not gel with the team.¬† And until then, your expectations of productivity are gone.
  2. It’s only good within the team.¬† You can’t compare cross team productivity if everyone’s got their own way to measure it.¬† I’m positive that I’ve argued before that comparing yourself to other companies isn’t necessarily a worthwhile activity, but being able to compare yourself to yourself is.¬† How do you know you’re getting better (or worse) if the ruler keeps changing size?
  3. It¬†can accept¬†lower productivity.¬† Sure, even if you accept #2 above, that you don’t want to compare yourself to your competition, you have no gauge as to whether better is even achievable.¬† Software is a relatively slow process, so you don’t get lots of data points out of it.¬† If you have to wait for your team to build up a measure of productivity, you could be accepting a lot lower productivity than is likely achievable.¬† Being able to compare yourself to a typical expectation of productivity provides a measurement it will take far longer for you to build up on your own.

I understand that there are people out there who believe software is more of an art than a science; that software will be practiced by small teams doing great things.¬† I happen not to agree with that.¬† It may be true some of the time, but certainly not all or even most of the time.¬† In most cases, software is more likely to be like building houses – unique but largely assembled from similar pieces.¬† And with that in mind, software is more predictable than we like to admit.¬† But predictability means you can have expectations of standard work times, and that you needn’t allow each team to define productivity in their own vision.