My “free market” sanity test

Every time someone tells me about the hottest new thing in IT, or whatever for that matter, I simply apply what I call my “free market test” to deciding whether there’s anything to it.

I like to pick on Agile, as you well know, so let’s use it for a comparison. We will compare it to outsourcing/offshoring. Now, back in 2001 when the Manifesto was signed, IT outsourcing wasn’t a new idea. I can’t pin down a date easily, but it looks like the ’90s saw a significant shift in IT outsourcing coincident with the rise of the internet. (If fairness, I could’ve easily picked on components of Agile as well. Most of the ideas in Agile are not new. Consider the small team approach. It can be traced back at least as far as Brooks’ seminal work “The Mythical Man Month” in his surgical teams approach.)

Here’s the thing. 10-15 years on, outsourcing is huge business. I haven’t dealt with a company who isn’t doing it. Agile is mixed. If the hype is to be believed, it should be the savior of all IT. So why is outsourcing ubiquitous and Agile is not? I think the free market test answers that. If something works, it will get widely adopted. No sane company is going to allow their competitors an advantage that they can easily copy. And any company can easily copy IT outsourcing or Agile.

The free market test is a simple way to think about whatever “new to your company” idea that may be going around. If everyone is doing it, chances are it is working. If it didn’t work, companies would abandon it, or those that continued to pursue the path would go out of business. The free market, like Darwin’s evolution, is survival of the fittest. Behaviors that don’t markedly improve fitness die off. Behaviors that do, thrive.

Cargo cults in IT

The end of WW II gave rise to a striking example of a causality problem. On small, remote islands, indigenous people encountered large numbers of military servicemen for the first time. As we, and others, landed ships on these islands, cleared jungles for runways, erected control towers and ran military drills the indigenous individuals observed newfound well being in the cargo our servicemen brought with them.

When the war ended, the military packed up their temporary bases and took their cargo planes with them. And with it went the newfound wealth of the native people. So, what did they do?

Well, they replicated the actions they saw taking place. They cut airstrips out of the jungle. They erected towers similar to the landing towers they saw. They executed military style drills. They carved wooden headsets to wear like the ones they saw servicemen wearing.

Since they didn’t understand the causality of the entire event – a war that led to need for new bases that led to airfields and cargo planes, they figured if they recreated the parts they observed they’d get the same outcome. Cargo planes ought to land and bring cargo. Of course, it doesn’t work that way, which is what makes the idea of a ‘Cargo Cult’ so interesting. We clearly see the logical problem in that situation.

When it comes to the corporate environment, however, we often fail to see our own Cargo Cult behaviors. We observe a great company, say Google, and we see that they have bicycles on their campus, so we buy bicycles for our campus. Or Facebook engages in hackathons so we start to do so too. But buying bicycles or performing hackathons is not going to make your company like those other companies. You are simply emulating behaviors that look like the other company without understanding the underlying culture that causes them to do these things. And as a result, you’re likely to get disappointing results from the mimicry. These companies aren’t successful because they engage in these behaviors. The companies are successful first and therefore may engage in these behaviors.

Which brings me took another point. We often look at older companies that fail and we observe that they don’t engage in these behaviors, and we use that as evidence that these behaviors are necessary to survive. But we can learn something from Mark Buchanan’s ‘The Social Atom’ on this point. In his book he demonstrates a ridiculously simple model that predicts the rise and fall of organizations, simply based on time and getting too big. You don’t need to model any behavioral stuff to get the effect as I recall. So, probabilistically, large old companies will decline even if behaviors are held steady. There will always be companies coming and going, and we will always be able to be selective and say “see, that company didn’t do X and they failed. We need to do X.”

If you find yourself saying that in the future, just remember that you may now be a card carrying member of a cargo cult.

What if static analysis is all wrong?

I just got back from a meeting with one of my former college professors. I’ve kept in touch because the academic world and research has much to teach us about how to operate in the business world. For one, without the financial pressures, academia is free to explore some crazier ideas that one day may create value.

In this recent meeting we were discussing static analysis and machine learning. Static analysis has proven frustrating in some of my analysis since it has no evidence of predictive power over outcomes we care about – defects the user would experience and team productivity. And yet we keep talking about doing more static analysis. Is it that the particular tool is bad or is the idea fundamentally flawed in some way?

What turned out to be a non event for machine learning might be an interesting clue to the underlying challenges with static analysis. This particular group does research on genetic programming. Essentially, they are evolving software to solve problems. This is valuable in spaces where the solution isn’t well understood. In this particular piece of research the team was trying to see if modularity would help solve problems faster. That is, if the programs could evolve and share useful functions, would that cause problems to be more easily solved? The odd non event was that it didn’t seem to help at all. No matter how they biased the experiments, the evolution of solutions preferred copying and tweaking code over using a shared function. Although the team didn’t look into it much, they suspect that modularity actually creates fragility in software. That is, if you have a single function that many subsystems use then if the function is changed the ripple effects may be disastrous. If there exist many copies of the function and one is changed, the impact is much smaller. One might argue that this could apply to human created code as well. It isn’t simply a matter of making code more modular and reusable, but perhaps only under certain circumstances. If true, it’d fly in the face of what we know about writing better software. And importantly it would quickly devalue what static analysis tools do, which is push you towards a set of commonly agreed upon (but possibly completely wrong) rules.

What can a snowblower tell us about software?

If you’re in the north eastern United States, you’re probably thinking about snow right now. And if you’re responsible for clearing the snow from your drive or walkways you might also be all too familiar with time behind a snow blower. For years I hand shoveled my walkways, but when we moved to this new house they were simply far too long for that.

It takes me about an hour to do all the clearing I am responsible for, so that’s a lot of time to think, which isn’t necessarily a bad thing. This particular snow is the deepest we’ve had yet. My snowblower has three forward speeds on it and presumably you use a slower speed when you have more snow to clear. The slower speed allows the auger to clear the snow before it gets all backed up.

So, as I was clearing the drive, I noticed something. Even at the lowest speed there was enough snow that some of it was being simply pushed out of the way by the blower. That meant that I’d have to do clean-up passes just to get that little bit of snow that the blower wouldn’t get on the first pass. And that got me to thinking. What if I just went faster? After all, if I was going to have to make a second pass anyway, who cares if it’s a tiny bit of snow or a little bit more?

And that got me to thinking about software. One approach might be to take it slow and carefully, but if you’re going to create bugs anyway, then perhaps going that slow isn’t the right answer. You’re still going to need the clean-up pass so you might as well let it happen and just clean up a bit more, right?

That sort of makes sense, if you think a second pass over the code is as effective as a second pass with the snow blower. In terms of dealing with snow, the blower is relentless. If it goes over the same ground twice it will do so with the same vigor as before. On the other hand, testing is imperfect. Each pass only gets about 35-50% of the defects (tending towards the 35% end). It isn’t like a snow blower at all. If you push aside a relatively big pile of snow with the a snow blower, it’ll get it on the second go. If you create a big pile of bugs in the code on your first go, a round of testing will likely reduce the pile by less than half. Then you need another pass, and another just to get to an industry average 85%.

There’s one other thing about going too fast that I learned with my snow blower. Some times you get a catastrophic failure. In my case, going too fast with the snow blower broke the shear pin on the auger. It’s a safety feature to prevent damage to the engine but it does make it impossible to keep using it to move snow. And software is a bit like that too. Go too fast and you may introduce a major issue that you’ll spend a whole lot of time cleaning up. It is not all about speed.

Want something fixed? Give it to someone who doesn’t want the job

I recently had a great experience with simplifying IT processes. Due to recent org changes, a home grown product (that was a ton of spaghetti code) got turned over to a new team. The thing was, the new team’s job wasn’t a support role and they didn’t particularly relish coding. Over time, the organization had become dependent on the product, although people suspected it was partially because they didn’t know any better. There were off the shelf tools which could do the same job.

Well, it turns out if you want to get rid of some job, the best people to give it to might be the people who don’t want the job in the first place. If you’re content building and maintaining a bunch of spaghetti code, and I then give to you an additional product, you’re likely to keep maintaining that one too. But, if you don’t particularly like coding, you are going to try and avoid doing it. One of the best ways to avoid doing something is to get rid of it.

In fact, this is exactly what happened. The team, who didn’t desire to fix everyone’s problems figured out how to replace the home grown product in just a few months. For years the organization had been told that it couldn’t be replaced. The difference? The old team was content to maintain it, perhaps even proud. The new team had nothing invested in it and didn’t want to.

“I don’t think necessity is the mother of invention. Invention . . . arises directly from idleness, possibly also from laziness. To save oneself trouble.”
― Agatha Christie, An Autobiography

What does your dashboard look like?

On my drive today I was thinking about my car’s dashboard. I drive a relatively modern car, so the dashboard is pretty simple – engine temperature, speed, tachometer, and fuel gauge. There’s not a lot to it. Looking at it reminded me, for some reason, of old car dashboards. They aren’t all super complicated, but then I found this example of a Bentley dashboard.

20140116-204813.jpg

Wow. That’s a lot of things. If you look closely, they aren’t all gauges, but there certainly are far more gauges than we have on a modern car. Why, I wondered? Well, it didn’t take too much thinking. What’s the purpose of my car dashboard? It helps me not break the law (speedometer), not break the car (tachometer and temperature) and make sure I get where I’m going (fuel gauge). While cars today are vastly more complicated than they used to be, the dashboards have gotten simpler, not more complex. As cars have become more reliable, and more black box, it has become less necessary (and less desirable) to display excess information. These four gauges pretty much cover the vast majority of what I need to know while driving my car. I could have gauges for all kinds of stuff, including running trends of every message every sensor sends to the on board computer. But they’re not there, because even if they were, I wouldn’t know what to do with the information. In fact, were I not driving a standard, I probably could do without the tachometer. On an automatic, engine speed and shifting is handled for me.

Which brings me to my point. Why is it that as cars have gotten more sophisticated our dashboards have gotten simpler, but in IT our dashboards have gotten more complex as our software process has matured? I suspect the reason is that because we can. There’s tons of data to be had from software development, and very little of it actually has much influence over the outcome of a project. If you keep a handful of things under control there’s no need to have excessive measures. As cars became more robust, there became less reason to monitor every possible system and the components started to disappear off the dashboard. If your software process becomes more standard, and there is less deviation to monitor, then your dashboard should become simpler as well. So, if you’re ending up with a complicated dashboard because your management “needs the information to make decisions” maybe it’s time to start asking which decisions simply don’t need to be made. Standardize and make the process robust; simplify the dashboard.

Scrum’s 50% failure rate?

So in a class today I was flipping through the materials and saw this:

20131210-222420.jpg

What does one make of such a statement? If this statement is true, then using Scrum as a methodology isn’t better than flipping a coin. Half the time it works, the other half it doesn’t. If your methodology choice is functionally equivalent to coin flipping, then the methodology necessarily doesn’t add value. Now sure, you could argue that if you’re on the failing side of the equation that you’re “doing it wrong”, but some consideration should be given to the idea that choosing a methodology (any methodology) is no predictor of success. All that said, even one of the original signers of the Agile manifesto is still obligated to produce data to show this is so. It’s an incredibly broad generalization.

The other thing is that the second part of his statement is far less specific. “Bad Scrum” is a “major cause”? What does major mean in this case? 50%? 25%? Something else? If half your projects are failing is resolving the bad scrum whatever that may mean going to make all of them not fail? Unlikely. We can reasonably assume that regardless of how well you do a process that under some circumstances it will fail, so how far will fixing bad scrum take us. It’s very hard to say from this statement.

From a trusting there’s analysis underlying these statements, I’d by far rather hear something like “in a recent study of N projects, 50% failed [how did they fail?]. Of the failures, N% can be attributed to ‘bad scrum’.” Sure, it doesn’t read like a nice little sound bite you can put onto a slide in a training deck, but it’s far more complete and far more useful to the reader to understand what the opportunity is for fixing the problem.

Seeing through messes rather than cleaning them up

At the Cutter conference I recently attended, a gentleman named Robert Scott gave the keynote address. In it he talked about what he thought the future of technology leaders would look like. One thing he said stuck with me – future leaders will need to be able to see through messes rather than clean them up.

Although I found it an interesting idea, it couldn’t quite fit it into something I could relate to. That was until a discussion today. I was talking with someone regarding measuring a quality assurance organization and each time I proposed a metric he would counter with something like “but if we measure it that way then what about the situation where X doesn’t apply?” essentially implying that there was noise in the measurement system and we’d have to clean it up before the measurement would be useful.

And that’s when Robert’s comment finally made sense to me. As a data person, I’ve learned to live and even embrace the noise inherent in measuring software development. It’s more like economics or sociology than it is like physics or chemistry. Given some exact set of inputs, the output will be directionally the same, but not exactly the same. But software people don’t go for mostly right or directionally right. It bothers them. After all, software is either wrong or it’s right. Either it’s a bug or it isn’t. There’s no such thing as that’s sort of a defect. So, when you introduce a measurement system that naturally contains noise, the discomfort sets in.

Here’s the thing. If we assume Robert Scott is right, that future leaders must be able to see through messes rather than clean them up, then the measurement of software stands a chance of moving forward. It will always be imperfect, but the data helps us to make better decisions. Yes, it’s messy. It contains noise, but rather than always trying to clean it up, acknowledge it is there, and start to look beyond the noise at the signal underlying it.

Tom Demarco talks ethics

I never took ethics in college. I also never planned on attending a conference to hear a talk on ethics. After all, ethics were sort of a base assumption from my perspective and not given much thought beyond large companies having employees sign a code of ethics on some sort of frequency.

In fact, I thought I was attending a talk on decision making, and thus expecting something about decision theory, game theory, maybe psychology. Certainly not ethics. But Tom took something that at first I was like “where is he going with this” to “wow!” I’ll attempt to do it justice, but I can’t promise that I will. To the best of my ability, here’s what I took away:

Aristotle believed that ethics were logically provable. Metaphysics contains all the things we can know about the world. Epistemology is built on that. It’s everything we can derive from what we know about the world. For example, Socrates is a man. All men are mortal, therefore Socrates is mortal. Ethics, Aristotle “promised” were provable with logic. For something like 2400 years, all kinds of philosophers tried to make good on this promise and were unable to. At some point in time, David Hume perhaps, classified metaphysics as “is” statements and ethics as “ought” statements. It was argued that it was impossible to derive an ought statement from an is statement.

Along comes Alasdair Macinytre. He argues that if something is defined by its purpose (this is a watch, for example) then the ought statements follow naturally. What ought a watch do? It should tell good time. So, that raises the question, what is the purpose of man?

We go back to Aristotle. Aristotle also created a mechanism for defining things. His definition requires that you group something and then differentiate it. So, a definition for man might be “an animal with the ability to speak.” That’s an is statement, for sure, but by Macintyre’s requirements, it doesn’t define man’s purpose. Macintyre goes on to define man as an animal that practices. Creating practices becomes man’s purpose. A practice is something we attempt to systematically extend. Tennis, for example, is a practice. The current tennis players have extended the sport in new and interesting ways, such that although famous tennis players of yore would recognize the sport they probably couldn’t compete anymore (even if they were young enough) because the sport has been systematically extended.

So, if that’s right, that man is an animal who practices, then for each practice we create, the oughts follow naturally. If you are a software developer and your manager comes to you and says “we need to take three months off this project” what ought you do? Well, first you ought to be honest – cutting the timeline will hurt the quality of my practice. Second, you must be just – quality matters to our customer, and we can’t deliver poor quality. It’s a disservice. And lastly, you must be courageous – go back in there and tell them no!

How many times has one of our employees, by this framework, acted ethically and we viewed it as a problem? Far too many times, I’d guess. The person with ethics, who values his or her practice and whose ought statements are clear can be frustrating. But when viewed through the lens of Tom Demarco’s talk, suddenly what they’re doing makes a lot of sense.

Do something counterintuitive

A post over at one of my favorite management blogs reminds me of my own recent experience with going for it on fourth down. Recently I’ve been working on a project to improve estimating. It’s not uncommon to hear that estimates should be created by those doing the work. Indeed, if a random person unfamiliar with the ins and outs of your system (namely management) estimates a project for you, odds are it’s going to be bad. But we can take it one step further, what if there was evidence that even if the person doing the work makes an estimate you should override that decision based on a model instead?

Steve McConnell notes in his book on estimating that various experiments have shown developers to be eternal optimists. One way he argues to correct for this is to simply make estimates larger. Unfortunately, when evidence shows you have a bias, then you aren’t going to make the right decision on fourth down, so to speak. In our own research, a model helped to compensate for human fallibility. Although we still got an estimate from the developer, when we combined their data with historical information into a model, we got an outcome that outperformed expert judgement alone 65-80% of the time. That’s not perfect, but it’s surely better than without any model at all.

We always want to believe in the greatness of the human mind to make decisions and in a massive number of cases we don’t know a better system, but as Curious Cat points out, sometimes the evidence isn’t what you’d expect it to be at all.