Good at heart attacks, bad at cancer

I was watching a video today from IBM that included insights from many world leaders, famous figures, and so on. A lot of times someone passes a long a video and I like it. Most of the time it doesn’t resonate for me, and sometimes there’s a single quote that sticks with you.

In this case, Fareed Zakaria said “we are very good at heart attacks. We are very bad at cancer.” He wasn’t referring to the medicine, however. He was talking about companies. His point was that companies react well to sudden traumatic events and very, very poorly to things that eat away at us slowly over time. It resonated with me not because I didn’t know the concept, but because it so elegantly states the difference between sporadic and chronic loss.

Sporadic loss is a heart attack. In software it’s the production outage. It’s all hands on deck. Everyone comes together for a few minutes or hours and rescues the system. Then we go back to the projects we were working on – crisis averted.

Chronic loss is a cancer. It starts developing and you don’t even notice it. By the time cancer has become a lump you can feel, it’s frequently too late to do anything about it. The prognosis for many late stage cancers is not good. It is very similar for chronic loss in organizations. At first, it’s a defect which generates a bunch of calls to the call center, but there’s a workaround and you deem it too expensive a fix to cost justify. So, you live with the workaround. And then there’s another, and another, and another. Over time, you allow hundreds or thousands of small failures to erode the quality of your product. Individually, none of them is an issue. Collectively, you have a mountain of a problem and a few hours of heroism won’t help you. The problem got there over years of deferred maintenance and it isn’t going to go away easily.

Chronic loss looks like a bloated production support organization. Chronic loss is when you have so many production incidents you can’t even fathom taking the time to attribute them to the projects that introduced the issues. Chronic loss is when you only talk about critical, high and medium severity defects because there are so many low defects they drown out the others. Chronic loss is when you justify that measurement system as ‘focusing on the big issues’ – the heart attacks are all you look at. Chronic loss is when you pay someone to look over the shoulder of someone else doing the work to make sure it’s done right, rather than figuring out how to error proof it. Chronic loss is batch abends that you just restart every month, or week, or night, or several times a day and never figure out why it failed.

Being good at heart attacks isn’t going to save you from cancer. But preventative care of your software will protect you against both risks.

Half done can seem like it is complete

It seems like every weekend that my family and I return from my in-laws that we forget something. I think that I may be to blame for it as well. We’re usually coming home on a Sunday, and the kids have to be back at school on Monday morning. So, some time around early afternoon on Sunday, I try and get a head start and pack up. Of course, my wife has always had the primary packing responsibility; it’s just that I want to be helpful and in doing so probably make things worse.

What I can see around the guest room we stay in I make sure to pack back into the suitcase, and then I’ll check the bathroom for toiletries and the kids’ beds for their stuff. I’ll pack what I can and at least pile the rest of what I collect near the suitcase as well. And this is where it goes wrong. See, I know what I wore and of all the people in the family I pack, by far, the least stuff of anyone. So, as a result there’s usually something around the house that I don’t even think is there. The problem is, when I pack up, the space looks packed up. It isn’t, in fact, totally packed, but it’s 90% packed. So my wife comes along, and seeing things organized assumes I’ve done a good job collecting things around the house. I haven’t, of course. Not because I’m lazy or don’t want to. I just don’t know what I should be looking for. On this particular trip back, we left behind my daughter’s stuffed horse. She was in tears when she figured out it wasn’t with us, and we had to call grandma to make sure that her horse was safe and sound and would be kept company until we could visit again. And grandma, being a good sport, agreed to do so.

Besides packing up for leaving from my in laws, I suspect this “half done looks completely done” might haunt us in many ways. For example, if you’ve reviewed some of the code we might assume you’ve reviewed it all. Or if you’ve built some of the system and showed it off, you might assume that we’ve built the whole thing. In reality, you’ve probably built the vast majority of it (just like I’ve mostly packed up) but without dotting the i’s and crossing the t’s, you haven’t done it all. Part of your review and inspection processes needs to include a complete walk of the deliverable to make sure it is all there. If you don’t, and it looks pretty good, you are very likely, I suspect, to not look deep enough to find out what is really missing. Our base assumption is a job well done, and we probably should start from a different place – that our work needs to be checked.

Why I’ve soured on Defect Containment Rate

At one point in time, if you asked me, I would have wholeheartedly agreed with Capers Jones and said the one critical measure you need to have is DCR (Defect Containment Rate).  Now that I’ve had several attempts to try and make DCR a reality, I’m convinced its one of the least useful measures you could have.

Where to begin:

1.  It’s incredibly lagging.  Once you put something into production, it can take weeks or months for all the missed defects to surface.  This is because of the way the system gets used and the potential lack of interest on the part of your users to report the issues.  Pretty much any measure you make will be biased towards optimism.  Optimism is not something you want in your measurement system, because optimism drives inaction.

2.  It’s hard to do.  Knowing what test defects you found is easy.  Knowing whether a production defect was caused by your project takes work.  You have to figure out when it manifested iself in the code, possibly loading multiple versions of the application into development environments to figure out when it actually got created.  And then there are the odd side effects that you can never be quite sure if you caused or not – like stability issues, etc. which manifest themselves from increased usage of your application now that you introduced new features.  Did this project cause the issue?  Well, no, not directly, but it contributed to the manifestation of it potentially.

3.  It doesn’t credit good behavior throughout the lifecycle.  Ideally, DCR should capture all the defects you contain in all the stages of software development.  Do code reviews and find things to fix?  You should count those.  Right?  Well, without running the code, you can’t actually be sure if the thing you found during code review would ever actually result in a defect.  Sometimes it’s obvious that it would (like a NULL pointer) or would not (like a shortage of comments) but oftentimes we don’t know.  We know that we don’t like the way the code was written and that there is a cleaner way to do it.  That hopefully contributes to the long term stability of the application.  In some sense, that means avoiding future defects, but did you actually contain a defect?  No, probably not, but DCR is never going to give you credit for avoiding a future defect.

4.  It tells you something we already know.  We’ve got this thing about not believing the industry.  Though several researchers have found that testing will remove 35-50% of defects per cycle (I’ve seen numbers a bit lower as well), we insist on measuring our own test capability.  Given that reaching the lower end of that goal isn’t hard – divide requirements into tests, write and run the tests, and record the results – do we really need to know how our testing is doing?  Let me give you a hint – it works about the same as everyone else’s testing.  Do 3 cycles of testing and assume you’ll get about 75-80% of the defects out.  Now go measure something that you don’t know much about, like the quality of the software coming into test.

 

5.  It focuses you on the wrong thing.  Guess what, testing will never be the best way to produce high quality code.  It’s a supporting player, at best.  But if you measure defect containment, you are basically admitting that you are reliant on your test capability to keep bugs out of production.  The best thing you can do to keep bugs out of production is to never write them in the first place or catch them much earlier than testing where your odds of success are reduced.  Get good quality code coming into test and the fact that you’ll only get 35% of the defects removed per test cycle won’t matter nearly as much.

I’m afraid DCR is probably not something I’m going to spend too much time thinking about any more, except perhaps to reiterate why we should be looking for better measures elsewhere.

The cost of everything, but the value of nothing

There’s a programming joke that I heard a long time ago that went something like “LISP programmers know the value of everything but the cost of nothing.”  The point being that LISP was a very powerful language that had basically no capability to use system resources efficiently.  When I used to program and LISP, back in college, this certainly seemed to be true.

However, that’s not what I wanted to write about.  It’s just that the joke popped into mind for some reason, and it made me think about a project failure a long time ago.  About 10 years ago, I used to develop software for a Point-Of-Sale system.  Retailers are notoriously cheap, so while technology had advanced quite a bit, some of the systems in the field were still 386 based processors.  As a result, we had to cater to the lowest common denominator in the field, and for reasons I forget, this meant that we had a fairly slow build process.  It took about 2 hours to build our system to get a distribution ready that could be installed into a test environment.  And then, the install process took another half an hour or more, depending on whether you went to floppy disks, or built a distribution package via the deployment tool.  All in all, it was slow.

If you were a decent developer, you didn’t go through this process too much.  You did all the debugging at your desk and only did the full builds when you were ready to test in the lab environment.  For the most part, we could simulate pretty much everything, but it was helpful to go to the lab for some reasons that aren’t worth getting into.

At any rate, one day, we delivered a package to our QA department and it wouldn’t boot.  That was a very strange event, since we couldn’t recall at time when the entire system simply wouldn’t come up.  After some researching we figured out what had happened.  The build wasn’t good because it wasn’t complete.  And why wasn’t it complete… well, that comes back to the original joke, except in reverse – “the cost of everything, but the value of nothing.”

See, a developer knew full well what the cost of our build process was – 2 1/2 or more hours of time.  They were in a rush, so instead of running the full build, they grabbed the pieces they thought they needed from their desktop, assembled a package and threw it over the wall to QA.  Had it gone well, we’d be none the wiser it happened.  But it didn’t go well.  Because they had circumvented the build process, we ended up with an invalid build, and instead of losing 2 1/2 hours to the build process, we lost almost a full day trying to figure out what was wrong plus the confidence that QA lost in us.

Developers are a lazy lot – after all, we spend our entire lives automating repetitive tasks.  Saving 2 1/2 hours seems like a good deal.  After all, if we could take an easy quarter of a working day off our delivery timeline, it seems like a no-brainer, right?  Well, as it turned out… not right at all.

Process has value in that it prevents errors.  The build process is but one of many processes we partake in.  When all we see is the cost of something, instead of the value it provides, then not doing the activity makes a lot of sense.  Now, by all means, if the costs outweigh the value, that’s an entirely different story.

But as a point of reference, it’s about 25 times more expensive to fix a bug in test than during coding, and about 100 times more expensive to let that same bug to production.  So, when you think you’re saving cost by cutting back, consider the possibility that you’re driving up future costs in exchange for a small savings now.  What’s the value of that?

Independence matters

A few years ago, my wife, daughter and I were out to dinner. My daughter was about three at the time. My daughter, who has always been a daddy’s girl looked up at me mid-meal and said “sit with you?”. To be fair, my daughter has me wrapped around her finger, but I said “not until you are all done.”

So, she looked down at her pizza and declared “I all done.” Smart girl, I thought. Since I had allowed her control of the definition of done, she could simply declare herself successful and get what she wanted. So, I clarified, “not until you finish your pizza.”

“Done” wasn’t something that I could assess really. After all, who was I to say whether she was satiated or not. All I could see was that there was pizza on the plate. What does this have to do with software, you might ask.

Well, the critical difference between “done” and “finish your pizza” is who was the one with the ability to assess success. In software measurement, many measurements have a denominator that serves to represent the amount of work. You might use function points, lines of code, story points, etc. But, the distinction you should be making in choosing your metric is “is this measure independent from the people doing the work?”

Lines of code and story points are not. Function points are. If you don’t establish the ability to independently assess the amount of work, then you are putting the definition of success entirely in the hands of the developers. That’s not to say that developers want to do a bad job, but when push comes to shove, it’s easier to fudge a metric you control than actually make a change for the better.

‘The best NYT correction ever’

I forget sometimes, being in software, that people make mistakes in lots of disciplines. Journalism is no exception, of course. This article that a friend of mine posted to Twitter was a particularly good one.

Not only do you get a sense for what a real obsession with quality is – the article basically says that the NYT will correct any error, but you also get a glimpse at their process for tracking these errors and potentially learning from them. For me, I knew that publications printed corrections, but they clearly go further than that. They track them, and if you are doing that you are at least part way to acting on preventing future errors. I also particularly like how the journalist doesn’t chalk it up to human error. No, she recalls the detailed events which led to the error. Here’s a person who is probably beating herself up over a mistake. It is this kind of attitude towards quality that we should seek to emulate in software development.

No bug goes uncorrected, we learn from our mistakes, and we care enough to be able to think about how we made the error in the first place.

Operator?

Does anyone remember the kids’ game “Operator” (not, the silly surgery game, Operation).  In Operator, which for us was usually played while sitting at the lunch table, the first kid would pick a simple phrase like “there is a cat in my house” and then, they would whisper this phrase into the ear of the person next to them.  Of course, since it was a loud lunch-room, it’d get slightly misheard and by the time it had been passed along 20 kids or so, the last kid would hear “there is a flat on my grouse” (which doesn’t make a lot of sense, of course).  The final kid would say the phrase they heard aloud and everyone would laugh, particularly when the first kid would announce what they had originally started with.  All in all, it was a very silly game, but kids liked it.

Today, we apparently still play that game, only we play it with more serious consequences.  When your testers write test cases directly from the system design or technical design, you’re playing operator.  Each time down the line, you’re counting on the translation from the prior step to this step to be faithful.  Like making a copy of a copy of a copy of a copy of an original document, it just keeps getting blurrier and blurrier.

If you want your testers to produce the most faithful interpretation of the requirements they can, they need to work from the requirements document.  If they assume that the analyst or technical lead has done the translation faithfully, then they are working from something that is potentially suspect.

Now, I’m all for using tools to do requirements trace-ability, but I have to say, even with tools like ReqPro or others that offer what they often call “ReqPro Lite” functionality, there’s still sense in making sure that your final check of the system is linked directly back to the original document you started with.

Don’t play Operator with your customer’s requirements.  Everyone should hear them first-hand, especially those who are responsible for checking they were implemented properly.

Want good quality? Work on something you can’t take back

I’m going to throw a crazy hypothesis out there – web based solutions promote poor quality.  Why would I argue this?  Because web based solutions are easy to change, so people worry less about quality concerns.  “If it’s wrong, we can just fix it later.”  “Getting something to the field quickly is important.”

What would you do if you were going to create some software and, say, put it on a rocket ship and send it to space?  What if you couldn’t take what you did back?  What if you were going to create software and embed it into a piece of hardware you were going to sell to millions?  What if it was an automobile – if the software misbehaved it could kill someone?

We don’t treat web based solutions like that.  It’s easy to change, so why worry too much about quality?  We can always just “test it in production,” right?  Well, yes, you can, but when it goes wrong that customer you upset may or may not be back for more.  Your customers don’t see the easy fix.  Your customers feel a bad experience.  And just because it’s web based doesn’t mean the experience isn’t important to them.  What if it’s their bank account that you screw up?  What if it’s medical records?

If you start thinking about what you release into the field as something you can never take back, you’ll look at quality quite a bit differently, I think.

Why root cause confuses

The term “root cause” seems to confuse software developers, and it seems to have been that way for a long time.  When a developer talks about “root cause” they tend to mean the place where the system started to go wrong. 

For example, if you call a webservice and it returns bad results that the caller might have detected there are two things you could do.  One, you could fix the caller to handle the error gracefully (often called defensive coding), or two you could fix the webservice to not return bad results.  Most developers I’ve run into will tell you that fixing the web service is “root cause.”  The idea being that if the web service didn’t do something bad, there’d be no need for defensive coding in the caller.

Fair enough, but this isn’t root cause.  Root cause has to go further, a lot further.  If root cause only goes so far as to fix the issue at the point of origination, then all you’ve done is fix one issue.  Instead, you hve to be asking the question: “why did we make that mistake in the first place?”  Was the webservice wrong because of a coding issue?  Requirements?  Design?

And further, why did you make a requirements, design or coding error?  What can be done to catch issues of this type in the future?  The bug you fixed is fixed, it isn’t going to be the next bug you deal with, but it is going to be one of a pattern of issues that you are not handling.

When you think about root cause, think beyond fixing the bug at the source.  That’s helpful, but it isn’t exactly the root cause of why the problem was introduced in the first place.  If you intend to stop future issues, you have to go further than just fixing the issue you have now.

Testing is not inversely effective

It’s time we cleared something up about testing.  It is a misconception that I fear is more commonly held than it should be, and it usually begins with “finding defects in testing is good, because we won’t find them in production.”  That statement is true.  However, it implies something that is not true, which is that you can test in quality.

Here’s the thing, testing is about 35-50% effective per test type (unit, functional, performance, etc.).  If the code is good, testing is 35-50% effective.  If the code is bad, testing is STILL only 35-50% effective.  That means, that if you find a lot of defects in test, there are even more to find that will make it to production.

So, for example, let’s say you had two teams code the same functionality and they both delivered code to you.  You run the same set of tests against the code and find that team A had 100 bugs and team B found 10 bugs.  The teams fix all the bugs you found and you retest to make sure that none of the tests you had are subsequently broken.  Is it fair to say that team A’s code and team B’s code is now effectively equivalent? 

I’ll give you a hint: no.  Given that you had a fixed set of tests to run, and didn’t adjust that due to the quality of team A’s code, your testing does nothing to fix the code you didn’t adequately exercise.  In a real project, the same is true.  If you write a set of test cases in advance of receiving the code, and the code quality is poor, unless you devise additional tests to increase coverage, one can assume the un-exercised code is of poor quality as well.  Thus, you will let more defects into production.

Testing is not, let me repeat, NOT inversely effective to the quality of the code.  You don’t suddenly get magical results from testing just because you delivered bad code to testing.  You get the same percentage results from testing, and let more bad code through.  This is why you cannot test quality into the system.