The difference between measure and incentivize

Afraid of taking measurements about your organization because of what behaviors it might create? Don’t be. Measurement alone isn’t harmful to the organization, and understanding how your organization works can be very useful. Oftentimes, the outcomes we want aren’t directly controllable. We want more sales, but you can’t just say to your team “make more sales” and actually expect it to happen. On the other hand, with internal measures of performance people often can just make the numbers look better.

Tell people to be more productive, and if you measure productivity by counting lines of code you’ll get undesirable behaviors like excess code and a resistance to optimizing existing code. However, if you figure out what behaviors cause more code to naturally get generated you can encourage or direct different behaviors in your organization.

For example, I frequently measure organizational productivity as function points per hour. That’s measurement. If I simply say to folks, make our productivity measure go up that’s incentivizing. If I then identify behaviors that matter I can continue to measure and understand what matters without it becoming an incentive that breeds bad behavior – like making your coworkers inefficient so you appear more efficient.

Measure, but don’t directly incentivize people on the outcome you want. Figure out the behaviors that matter and focus there. The outcomes will follow.

What is our obligation as a data scientist in presenting information?

I just spent three great days at a conference. I hadn’t been to one recently, and a lot of times that I’ve gone I’ve been disappointed by the heavy vendor focus on selling products and giving away trinkets. This particular conference was much more intimate, and well worth my time to pick other people’s brains.

In one session about fifteen of us discussed designing effective dashboards. Although I love Edward Tufte, the conversation here never once touched on data to ink ratio or any of his other great ideas. Instead, we spent a large portion of the roundtable le debating the obligation of a data scientist to guide the dashboard design process.

For example, not that long ago Stephen Few held a dashboard design competition ( here’s his result). The challenge I had with this competition wasn’t in how well the data was presented. Indeed I learned a great lot by looking at the winning solution and Stephen’s solution. What left me feeling unsatisfied was the missing opportunity to discuss what should be presented versus how it should be presented.

And this, to me, is the central question of information display. There are many valuable rules about presenting information once you decide to present it but scant little advice on how to decide whether to present it at all. Statistics literacy is weak in many organizations, so as a likely stats expert in your organization, what do you owe them?

  • Help them to identify the outcomes that matter first. Absent this, no dashboard can be useful. It will just be a lot of beautifully presented garbage.
  • Help them determine potential leading indicators and help them assess whether they matter. It’s not enough to have good ideas about what might matter. We have an obligation to test the relationships.
  • Help them think about the sometimes subtle and insidious problems of statistics… False causation, mathematical quirks that change apparently relationships (like log transforms on a single axis or sharing a variable between two composite measures), and other things that will mislead.

If we fail to do these things for our organizations then we do a disservice to our science and to those that we work with. Dashboards should not be created simply to provide confirming evidence for the world view we want to hold, but to help us seek out information that disconfirms our beliefs also.

No True Scotsman

I was recently enjoying the Illustrated Book of Bad Arguments when I came across the logical fallacy called “No True Scotsman.” It immediately reminded me of a discussion I was having with a couple individuals on software development methodologies. Up until recently, a team had been pushing their development methodology as Agile development. That was, until interest in Scaled Agile Framework had emerged. I can’t comment on the goodness of SAFe, but we had lots of data available from this group on the efficacy of their form of Agile.

The discussion proceeded something like this

Them: “We’d like to start using SAFe, can you help us establish a measurement system to support our decision.”
Me: “Sure, I’m always willing to experiment and learn, but let’s approach this with some skepticism. Our past data on our Agile projects indicates no evidence of statistical differences in quality or productivity.”
Them: “What we were doing wasn’t true Agile.”
Me: <stunned silence>

This is a great example, and interesting use, of the No True Scotsman argument. When we lacked data about what we were doing, it was Agile, but the minute we had data that it may have not been that beneficial, suddenly it wasn’t really Agile anymore. The conversation forced me to do some digging into the scholarly research on Agile methods. There’s a dearth of good research here. Jones offers some analysis in his article here: http://www.infoq.com/articles/evaluating-agile-software-methodologies. I like Jones a lot, being one of the few researchers to take up potentially unpopular positions. He is in the consulting business, so he doesn’t give away his data set which for academics is challenging. There’s also this systematic review of Agile from 2008 (ancient in IT terms, I know) which concludes there’s little strong evidence for Agile methods in the available research. But, of course, if you’re willing to adopt the No True Scotsman argument, that doesn’t matter. After all, whatever gets studied won’t really be a good example of it anyway. 😉

Is variability in software development good?

I myself wrote an article not too long ago on the subject of variability in process. Under some circumstances I think variability might be desirable, although I wasn’t particularly referring to software development. Last week I attended a webinar hosted by ITMPI and given by one of the employees at QSM. His talk was about the measurement of productivity in IT, specifically focused on how to account for variations in productivity when estimating. The talk was pretty good, but one of his early slides bothered me.

On that one slide he argued that software development wasn’t like manufacturing and therefore productivity couldn’t be measured like manufacturing does. Unfortunately, he offered no alternative during the talk. Instead he focused on how to measure unexplained variations in project outcomes and to aggregate this into some sort of vague productivity calculation. On the whole, useful for estimating if you just want to know the odds of a certain effort outcome, but not so useful if you want to learn about what factors impact productivity.

It’s true that software development doesn’t have a lot in common with manufacturing and the analogies drawn are often strained. That’s not so concerning to me, as the spirit of what management is often asking is right – what choices can I make to do this faster, better or cheaper. In that context, productivity isn’t just something you find out about after the fact, it’s something you want to understand.

With my own research, we’ve found measurable evidence that certain activities do make a difference in productivity. Colocation is worn about 5% better productivity. Committing resources to projects is worth about 0.4% for every 1% more committed on average your team is to a project. Which gets back to the question I posed in the title: is variability good?

In short, no. But the longer answer is, just like any process, you have to know what to control. With a highly thought intensive process, there are things you can and should seek to control to produce more predictable outcomes. It is true that software development is a more like the design of a product than the manufacture of one, but that doesn’t mean that anything goes.

When do I get to be a “trusted partner”?

If I had a nickel for every time I’ve heard someone in IT say something along the lines of “we want to be a trusted partner,” I’d be wealthy. If I had a nickel for every time a business person said it, I’d be broke. Becoming a trusted partner seems to be something that IT is obsessed with, but not so much on the business side.

I do think that being trusted is important, but no matter how much you talk about it, it will never be granted to you. It must be earned. While I can’t tell you how to earn it, I can give you a simple example of how it could be earned.

I’m not an auto mechanic, so I am forced to trust the guy at the mechanics to take care of my car. Because of the information inequality – he knows way more than I do about cars – I am always suspicious of his motives. After all, he is in the position to diagnose my problem and then make money on me by fixing the supposed problem. Here’s a person I inherently distrust. Sort of sounds like IT as well to me…

One day I took my car in because I swore something was wrong with it. I was prepared to pay for new brakes mentally. So, when they put it up on the lift, told me that the brakes were fine and then didn’t charge me, I started to see this shop in a different light. And it wasn’t just once that they didn’t push unnecessary work on me, but several visits. Usually I was just in for an oil change and since I was there I’d ask about something else. Time and time again they probably could have fleeced me and didn’t.

After that, I trusted them to tell me when things were wrong and was more willing to have them done. That’s what establishes trust. It’s not doing as you are asked, even if you do it cheaply. It’s not suggesting all kinds of new and shiny things you could do. It starts by doing something that is truly in the customer’s best interest in a way that they know it is. Sure, a fancy architecture might be in their best interests in a really long run, but your customer isn’t going to sense that.

Save all that for later, when they’re finally listening to you. Start off by demonstrating a willingness to solve their immediate problems, to save them money and time, and to help them avoid unnecessary work and you will have a much better chance of becoming trusted by the business. Continue to pretend you know better and you can just keep talking about becoming one.

How to construct a good ROI

Far too often the proposed cost benefit analysis that I see aren’t worth the paper they are printed on. For the most part, this stems from the benefit side of the calculation and not the cost side. Although we do make poor estimates, getting better at estimating isn’t terribly difficult. Start by reading Steve McConnell’s “Software Estimation” and you’ll be well on your way.

On the benefit side, this is where things go haywire. Lets say we’re talking about the benefit of better code reviews. There’s lots of industry data that indicates code reviews are valuable when done well.

So the math in people’s heads might go something like… Better code reviews reduce defects. Lets assume a test defect is … I don’t know … worth $1000 each, and that we can cut defects by 75% by doing better code reviews and that a code review can be done in ten minutes. Even if the basic formula is right, all the inputs are wrong. Just like a computer program, garbage in, garbage out.

In order to do the benefits half of the equation you need some data to help you with your assumptions. These things you assume are likely knowable, or at least we can get in the right ballpark. Want to know what it costs to fix a defect? Do a brief time study exercise. Or, if you know the cost of a production defect (which for some reason we seem to often know) then use research like Roger Pressman’s to arrive at an approximate cost of finding the defect in the testing or coding phases. The number is probably closer to $500.

Next, look at what the industry data has on efficacy of code reviews. A 65% improvement is not unheard of, but assuming you’ll capture the entire benefit plus more right out of the gate is pure optimism. First off all, you might be doing some reviews today, which blunts your benefit because the potential gain is smaller. Secondly, you won’t be able to capture the entire potential benefit most likely. In one example I looked at, the difference in defect density between teams that did and didn’t do code reviews was 20%. So, if effective code reviews are 65% effective, the maximum opportunity was only 40%, not the proposed 75%. Worse, when buying third party tools or services, you can’t rely on the sales person to provide you good numbers. They have a vested interest in you buying the product and thus in making the ROI work.

And then on the ongoing cost side, it takes a lot longer than ten minutes to do a code review. All in all, code reviews are certainly worth it, but you won’t get this too good to be true benefit from them. In many cases, we have a solution in mind, but no idea how much benefit we might receive so we make up numbers. Sure, that fills out the required paperwork, but it really isn’t due diligence. We have an obligation to support our assumptions with some data (our own or external).

Why “spend it like it is your own” is a bad mantra

Recently I’ve been hearing people say fairly frequently “spend it like it’s your own.” The intonation behind this is to not waste the company’s money on things that the company doesn’t need. In other words, if someone gave you a million dollars, do what you would do with it. The foregone (but wrong) conclusion is that you will spend it wisely.

Seriously? If someone gave you an extra million dollars are you the person who would set it aside for your kids’ college, fill up an IRA and make sure you had an adequate rainy day fund? Of would you finally buy that beach house you’ve been dreaming of? The new BMW? The huge TV?

Ok, maybe you would, but what about everyone you know? Is everyone you know that responsible? I doubt it. In fact, I’m willing to bet you know a spendthrift or two, or more. People who have to go on vacation every summer but haven’t fully funded their retirement. People with two new cars but $30k in credit card debt. People with the latest gadget, no matter what it is. People who stand in line for the newest iPhone. In short, people who don’t exhibit any self control. People who can’t discern between ‘need’ and ‘want.’

Even if this person isn’t you, these people exist. They’re not bad people, but they have different priorities. Perhaps you recall the cookie study done on a bunch of children? Essentially, they offered kids the choice of one cookie now, or if they could wait a bit, two cookies later. Guess what, most children couldn’t wait. More surprisingly, as they followed these kids through their lives, the ones who could not delay gratification did less well in life overall (how you measure that, I don’t know.). It’s built in to many of us to desire what we can have now, whether it is the best choice or not. So with that in mind, the mantra “spend it like it is your own” probably deserves reconsideration.

An (accidental) study on the causality of static analysis

Capers Jones in his book “The Economics of Software Quality” lends support to the idea that static analysis is an effective tool for improving quality. He doesn’t directly address whether static analysis is correlated or causal of better quality.

The problem with static analysis is the open question about whether static analysis causes better quality or that teams who care about quality already are prone to using static analysis tools. While I can’t answer the question definitively, I can provide a data point.

A large organization in financial services had installed a commercial static analysis tool a couple years ago. During that time they collected lots of data from the tool but never acted on the vast majority of recommendations. In effect, the accidentally conducted an experiment about the direction of causality. The organization also had enough of a function point counting capability that they could measure productivity and functional correctness accounting for the variability in size of projects via function points.

In the absence of any action on the data from the tool, we ought to expect that applications that score better in the static analysis tool will show evidence of higher team productivity or better functional quality. In essence, static analysis should predict better quality in the absence of any other action. However, the tools didn’t. Instead, we found no evidence of a relationship between static analysis tool results and customer outcomes – productivity or quality. Now, it may have just been the tool selected, and without more experiments we can’t rule that out. But it’s one data point on whether static analysis tools make a meaningful difference to quality.

Two kinds of inaction

I was recently thinking about inaction in response to some event. It was brought to mind by some research I was reading on giving managers the choice to not act (wish I could remember where I was reading it.). It turns out that given an A or B choice, managers are more likely to make a decision than if you give them an A, B or “no decision” choice. In the latter, they are more likely to not act. I suppose that’s not entirely surprising, since we likely frame our worlds around the options posed to us and don’t think much beyond that. Want someone to take action, offer them only two paths of action.

As I kept thinking on it, I started to wonder what drove inaction among managers. One possible reason for inaction, and probably the more common one I would guess, is not knowing what to do in response to a situation. You see a project risk, but can’t think of a way to mitigate it, so you don’t.

Then there’s another kind of inaction. You don’t take action because you know to not do anything. That is, you observe something happening that might normally drive someone to take action, but because of additional knowledge you know to do nothing. What you’re seeing might be a statistical blip, or real but already mitigated by existing processes, etc. I don’t think this type of inaction happens often enough. Going back to the original research, we don’t offer the choice of inaction enough, and given how little we seem to know about software development within our organizations, even if given the option of inaction, we don’t know when we should choose it.

Data is exciting to me in two ways. One, it shows when something is abnormal, but it also shows when something isn’t abnormal. Knowing that something is typical is a great way to know not to take action. We see it in our own lives in little ways. When you have a newborn, any little cough or sneeze is likely to cause you to race off to the doctor. After a little bit you begin to learn what is normal vs. abnormal with your kid. Suddenly if they get the sniffles you just tuck them up in a blanket, turn on the TV and let them rest. Knowledge allows you to choose inaction (and save you the cost of a visit to the doctor.)

Inaction is free, presuming you know when to choose it. Using data to help you figure out when not to act is a smart investment.

The downside of delaying commitment

The Poppendieck’s helped push the idea of delayed commitment in software development. The idea is relatively straightforward and on the surface seems good. If you don’t need to make a choice, don’t. In terms of LEAN thinking, if you consider a decision in software development to be akin to creating inventory in manufacturing, then this makes perfect sense. Any decision made too early has the potential of becoming invalid with time. In the same manner, a part made in a factory too soon runs the risk of becoming useless.

We’re in the process of buying a home, and one of the very often delayed commitments is to an insurer. In the USA, and presumably elsewhere, if you want to get a mortgage, the lien holder wants you to insure your property. After all, without the collateral of your property, the lien isn’t worth very much. So, it’s very much in their interest to assure that if something bad happens that it will get repaired. You don’t need the binder from your insurer until within days of closing. You’ve probably already done the inspection, paid for the appraisal, perhaps paid money to lock the interest rate, etc. The seller will have turned away other buyers. After a month and a half, everyone is pretty well locked into this transaction. Tens of thousands of dollars may be at risk.

But that’s not my personality. If I can get the data to make a choice and make one, I will. And getting a quote on insurance was something I could easily do well in advance of closing. Delayed commitment? No, far from it. It’s usually just a formality to request the quote and lock in the insurance, but things have changed. Higher losses due to delayed maintenance and more extreme weather have made insurers more cautious. When I called, the first insurer refused to underwrite the risk because of the age of the roof. So did the second insurer. Had I waited to get these quotes then I would have found this out within days of the closing. No insurance, no closing. We’d be heavily committed to this house with a strong possibility that I would have lost all the money I put down.

At this stage, all I’m risking is the cost of the inspection. Sure, I don’t want to lose $500, but I’d far rather lose that than 5% of the purchase price. At this stage my goal is to renegotiate with the seller regarding the roof. My negotiating position is strong – I’m only into it for a few hundred while the seller now faces the possibility of being completely unable to sell without replacing the roof. Any buyer who isn’t paying cash faces the same problem. They won’t be able to get a mortgage without insurance and insurers won’t underwrite the house in the current condition.

Had I waited, my negotiating position would have been incredibly weak. I’d have put out perhaps $30,000 between inspection, appraisal, legal fees and earnest money. I’d be forced to pay for the roof myself or lose all that money. Delaying commitment isn’t always the safest choice. No matter how routine the delayed decision has been in the past, if you don’t understand the risks of not knowing the answer, you put yourself in a precarious position.

I’d argue that for anything you can know, don’t delay knowing it. Some minor rework costs are more predictable and preferable to the occasional complete blowout loss. There is value in predictability, even if the costs are slightly predictably higher.

Update: Interestingly, the seller chose not to negotiate over the replacement of the roof. We went on to buy another house, since we were unwilling to assume all the risk. The seller did eventually sell, and not terribly long thereafter, but at $15-20,000 less than our offer. In the sellers case, delaying the decision to negotiate cost them as well. Sometimes what you have right before you is as good as it is going to get.