Why I’ve soured on Defect Containment Rate

Filed under: DCR,Measurement,Quality — Adam Schwartz - May 9, 2012

At one point in time, if you asked me, I would have wholeheartedly agreed with Capers Jones and said the one critical measure you need to have is DCR (Defect Containment Rate).  Now that I’ve had several attempts to try and make DCR a reality, I’m convinced its one of the least useful measures you could have.

Where to begin:

1.  It’s incredibly lagging.  Once you put something into production, it can take weeks or months for all the missed defects to surface.  This is because of the way the system gets used and the potential lack of interest on the part of your users to report the issues.  Pretty much any measure you make will be biased towards optimism.  Optimism is not something you want in your measurement system, because optimism drives inaction.

2.  It’s hard to do.  Knowing what test defects you found is easy.  Knowing whether a production defect was caused by your project takes work.  You have to figure out when it manifested iself in the code, possibly loading multiple versions of the application into development environments to figure out when it actually got created.  And then there are the odd side effects that you can never be quite sure if you caused or not – like stability issues, etc. which manifest themselves from increased usage of your application now that you introduced new features.  Did this project cause the issue?  Well, no, not directly, but it contributed to the manifestation of it potentially.

3.  It doesn’t credit good behavior throughout the lifecycle.  Ideally, DCR should capture all the defects you contain in all the stages of software development.  Do code reviews and find things to fix?  You should count those.  Right?  Well, without running the code, you can’t actually be sure if the thing you found during code review would ever actually result in a defect.  Sometimes it’s obvious that it would (like a NULL pointer) or would not (like a shortage of comments) but oftentimes we don’t know.  We know that we don’t like the way the code was written and that there is a cleaner way to do it.  That hopefully contributes to the long term stability of the application.  In some sense, that means avoiding future defects, but did you actually contain a defect?  No, probably not, but DCR is never going to give you credit for avoiding a future defect.

4.  It tells you something we already know.  We’ve got this thing about not believing the industry.  Though several researchers have found that testing will remove 35-50% of defects per cycle (I’ve seen numbers a bit lower as well), we insist on measuring our own test capability.  Given that reaching the lower end of that goal isn’t hard – divide requirements into tests, write and run the tests, and record the results – do we really need to know how our testing is doing?  Let me give you a hint – it works about the same as everyone else’s testing.  Do 3 cycles of testing and assume you’ll get about 75-80% of the defects out.  Now go measure something that you don’t know much about, like the quality of the software coming into test.

 

5.  It focuses you on the wrong thing.  Guess what, testing will never be the best way to produce high quality code.  It’s a supporting player, at best.  But if you measure defect containment, you are basically admitting that you are reliant on your test capability to keep bugs out of production.  The best thing you can do to keep bugs out of production is to never write them in the first place or catch them much earlier than testing where your odds of success are reduced.  Get good quality code coming into test and the fact that you’ll only get 35% of the defects removed per test cycle won’t matter nearly as much.

I’m afraid DCR is probably not something I’m going to spend too much time thinking about any more, except perhaps to reiterate why we should be looking for better measures elsewhere.

A moment of shameless self promotion

Filed under: Measurement — Adam Schwartz - April 25, 2012

I’m pleased to say that a fun package arrived in the mail today.  It was the conference proceedings from ITNG 2012 (IT Next Generations 2012) 9th Annual Conference.  Generally, conferences are good stuff, since you get to meet other people and be exposed to new ideas, but this one was particularly special for me.  It represents the first conference where I’ve had the pleasure of publishing my own work.

One would think, working in industry, that if I’d discovered a new/great idea that the route ought to be to patent it and protect my intellectual property.  I don’t see things that way.  As you can tell, I’m inclined to freely give away lots of knowledge that I have because I know, when you need specific help with your situation, that I’ve been an open book about exactly what kind and quality of advice and guidance you’ll get from me.  And for that reason, I’m pleased to be able to say that we’re breaking new ground with looking at how to measure the size of a software system.

Sure, there’s always KLOC (thousand lines of code) or FP (function points) but both these systems have major drawbacks.  Capers Jones has gone so far (and I agree) to call the use of KLOC as a measurement system “professional malpractice.”  Function points makes lots of improvements, but can be costly.  There appears to be a simpler answer – simply use the test cases as a proxy for the function points delivered and use those as your measurement system.  If you’re so inclined to read further, check out pages 242 – 246 of ITNG 2012.  I hope some new ideas in the software measurement space can help improve the science.

Richness versus Recall

Filed under: Agile,People,Requirements — Adam Schwartz - March 15, 2012

Alistair Cockburn presents an interesting insight in his presentation “I come to bury Agile, not praise it.”  On slide 12, he presents the richness of the communication channel as an important part to getting information across.  Surely, you’ve experienced this yourself with a never ending chain of back and forth emails that were quickly resolved with a single 1 minute phone call to clarify.

Therefore, it makes enormous sense to replace communication of low richness with communication of high richness, right?  Well, I’m not sure it’s that black and white.  In order to use information effectively, you not only have to be able to communicate it, but also to recall it when you need to use it again.

For example, you sit down and have a conversation with the user and then turn around to write some code.  The ability to translate what the user asked for into code depends not only on having the conversation but remembering all the details of the conversation correctly.

So, do you have an Eidetic memory?  Probably not.  How long can you accurately recall a conversation?  Long enough to turn it into code faithfully?  Probably not as well.  You can probably remember the nominal case, but what about all the exception handling you discussed?

Now, I’m not saying you should communicate via email or paper only since that’s clearly silly, but on the other extreme, you probably shouldn’t communicate orally only as well.  Indeed, merging the face to face conversation with documentation helps manage both the completeness of conversation and ability to recollect details when you need it.

OCD or just good management?

Filed under: Management Philosophy — Adam Schwartz - March 14, 2012

I’ll freely admit that I’m a bit obsessive compulsive about things, and I’m sure my wife would laugh at the “a bit” modifier I put on that.  But I was thinking about it this morning as I was driving to an early appointment.  See, I had to get somewhere that I hadn’t been before via roads which could have major traffic backups.  I had no experience with the location or with the traffic patterns.

Think of this as trying to run a software project – a known destination with a known but risk prone path to get there.

How you’d handle this situation grants you insight into what kind of project manager you would be.  If you’re a typical person, you’d plan to leave a few extra minutes just in case.  If you’re a true project manager at heart, you’d go a lot further than that, I believe.

I showered the night before.  It’s not that I didn’t intend to shower in the morning, but I saw a risk.  What if my alarm clock didn’t go off for some reason?  Power outage.  I mis-set it.  Random alarm clock failure.  So, I showered and shaved the night before, full well knowing I might do it again in the morning, but guaranteeing that I would definitely not show up having not showered for 24+ hours and unshaven.  At worst, it’d be 8 hours and a slight 5 o’clock shadow (albeit at 9:00am).  Was I just being crazy?  No, in fact, I had past experiences with my alarm clock not going off (who hasn’t?), so it made sense to actively manage a potential risk.

I laid out my clothes the night before.  What if I selected something with a stain and didn’t notice until the last minute?

I selected a driving route that was slightly longer but more likely to have less variability.  A key to being a good project manager is to minimize variability in the outcome – in my mind it’s better to cost slightly more (slightly being a key word) to prevent the possibility of a major overrun.  In traffic terms, a guaranteed extra 5 minutes of driving is better than the risk of being 45 minutes late.  For the overall organization, if you deliver your project over budget, but someone else delivers under, it all comes out in the wash.  However, you only participate in a limited number of projects, so what comes out in the wash for the organization may not play out so well for you as an individual manager.  If you’re not managing risk, you run the risk of being the person with the over budget project.

Lastly, I got multiple estimates.  I used both Bing and Google maps to look at the route.  Inconsistency in estimates is a warning sign that something isn’t right.  Indeed, they disagreed by nearly 15 minutes, so if you think a few extra minutes buffer is going to get you there on time, you might be in serious trouble if you relied on the wrong site.

In the end, my alarm clock went off just fine and I arrived early… despite encountering construction and rush hour traffic I didn’t know about.  You might say “oh, you’re just being OCD, you would’ve showed up on time anyway.”  Possibly so, but that’s no way to manage risk.  If you don’t do anything to mitigate the things you can control, you won’t survive the things you can’t control.

Your change of mind doesn’t excuse my error

Filed under: Management Philosophy,People — Adam Schwartz - March 8, 2012

Who doesn’t appreciate dodging a bullet, right?  You know that your project is going wrong – resource issues, a big requirements misunderstanding, a major design flaw… maybe even despite your best project management, proper risk analysis, etc.  Sometimes things go wrong.  In fact, the Standish Group, who publishes the CHAOS report, indicates that as an industry we’ve approached about 70-75% of projects being +/- 10% budget and schedule, and we don’t seem to be getting a lot better than that.

That means, 25-30% of the time, we’re going to miss by a bigger value than that.  At any rate… let’s say your project is going south and there’s nothing you can do to recover.  You’re going to miss your promised date or budget.

Suddenly, the business changes their mind.  Maybe in a way that’s unrelated to the issues you’re having.  Phew!  You breathe a sigh of relief, since you can now use the business’ change to reset dates, including enough time to fix the issue(s) your dealing with and come out smelling like roses.

Not so fast, I say.  Dodging a bullet is great, but if you fail to learn from what would otherwise have been a failure, you’re doing yourself a disservice.

It’s like watching a movie where the only reason the otherwise doomed hero escapes is due to some serendipity.  Sure, it makes for a great movie when a hapless bird flies into the power lines, taking out the power, plunging the enemy into darkness and allowing the hero to sneak off largely unscathed.  But, if that played out in the real world – most of the time the bird would never come along and the hero would be dead.

You can’t count on a random event to save you nearly as often as it happens in the movies.  So, when your potential failure is only alleviated by a lucky turn of events, still take the time to reflect on the failure that could have been and learn from it rather than rejoice it never came to be.

It’s not a recurring issue, until it is one

Filed under: Management Philosophy — Adam Schwartz -

I know, the title doesn’t make much sense, but let me explain.  When faced with a decision as to whether to fix a defect, we often have to make choices between what to fix.  Limited resources require us to pick and choose, particularly where we haven’t invested in quality historically.

In my ideal world, we’d never get into the situation where we were deciding between whether to fix defect A or defect B, but it happens.  One prioritization mechanism I’ve seen people use is to focus on high-impact and recurring issues.  Here’s the thing, every issue if not permanently addressed will become a recurring issue.  So, if you’re using whether an issue has recurred as the driver of what to fix next, your selection process is totally arbitrary – it just depends on who uses what functionality and reports the next bug as to what you’ll work on.

Instead, focus on potential impact.  Sure, a given bug might be an annoyance to one person, but if it happened to many people would it suddenly become a major problem?  If the answer is yes, the fact that it hasn’t recurred yet should not be the reason you don’t work on it.

Remember, nothing is a recurring issue… until it is.

The cost of everything, but the value of nothing

Filed under: Management Philosophy,Quality,Return on Investment — Adam Schwartz - March 3, 2012

There’s a programming joke that I heard a long time ago that went something like “LISP programmers know the value of everything but the cost of nothing.”  The point being that LISP was a very powerful language that had basically no capability to use system resources efficiently.  When I used to program and LISP, back in college, this certainly seemed to be true.

However, that’s not what I wanted to write about.  It’s just that the joke popped into mind for some reason, and it made me think about a project failure a long time ago.  About 10 years ago, I used to develop software for a Point-Of-Sale system.  Retailers are notoriously cheap, so while technology had advanced quite a bit, some of the systems in the field were still 386 based processors.  As a result, we had to cater to the lowest common denominator in the field, and for reasons I forget, this meant that we had a fairly slow build process.  It took about 2 hours to build our system to get a distribution ready that could be installed into a test environment.  And then, the install process took another half an hour or more, depending on whether you went to floppy disks, or built a distribution package via the deployment tool.  All in all, it was slow.

If you were a decent developer, you didn’t go through this process too much.  You did all the debugging at your desk and only did the full builds when you were ready to test in the lab environment.  For the most part, we could simulate pretty much everything, but it was helpful to go to the lab for some reasons that aren’t worth getting into.

At any rate, one day, we delivered a package to our QA department and it wouldn’t boot.  That was a very strange event, since we couldn’t recall at time when the entire system simply wouldn’t come up.  After some researching we figured out what had happened.  The build wasn’t good because it wasn’t complete.  And why wasn’t it complete… well, that comes back to the original joke, except in reverse – “the cost of everything, but the value of nothing.”

See, a developer knew full well what the cost of our build process was – 2 1/2 or more hours of time.  They were in a rush, so instead of running the full build, they grabbed the pieces they thought they needed from their desktop, assembled a package and threw it over the wall to QA.  Had it gone well, we’d be none the wiser it happened.  But it didn’t go well.  Because they had circumvented the build process, we ended up with an invalid build, and instead of losing 2 1/2 hours to the build process, we lost almost a full day trying to figure out what was wrong plus the confidence that QA lost in us.

Developers are a lazy lot – after all, we spend our entire lives automating repetitive tasks.  Saving 2 1/2 hours seems like a good deal.  After all, if we could take an easy quarter of a working day off our delivery timeline, it seems like a no-brainer, right?  Well, as it turned out… not right at all.

Process has value in that it prevents errors.  The build process is but one of many processes we partake in.  When all we see is the cost of something, instead of the value it provides, then not doing the activity makes a lot of sense.  Now, by all means, if the costs outweigh the value, that’s an entirely different story.

But as a point of reference, it’s about 25 times more expensive to fix a bug in test than during coding, and about 100 times more expensive to let that same bug to production.  So, when you think you’re saving cost by cutting back, consider the possibility that you’re driving up future costs in exchange for a small savings now.  What’s the value of that?

Simulation works if you can make the intellectual leap

Filed under: Management Philosophy — Adam Schwartz - February 29, 2012

Some time ago I read a study about playing basketball. The study divided players into three teams. One team shot baskets on day one, went home for a few days and came back and shot baskets again. Team two shot baskets on day one, practiced on the court for a few days and shot baskets again. Team three, shot baskets on day one, and then thought about shooting baskets for a few days and shot baskets again.  Not surprisingly, the control (the team who didn’t do anything in between the two tests) didn’t change at all.  What was surprising is that the team who thought about shooting baskets did as well as the team who actually practiced!

Recently, I got to be involved in a simulation of a software development process.  A room full of people was divided up into teams, given roles and told to build a fairly difficult Lego kit.  If you’ve played with Legos before, you’ll know that it can take one person quite a while to build one of these things.  Lots of tiny parts, sometimes the directions are challenging to quite tell what you’re supposed to do, etc.

At any rate, add on top of that a business sponsor who keeps bugging you, status reports to provide, rules about who can touch what (just like a real project, qa people don’t write code), and demands to send some of your development work offshore and you’ve got something that feels and acts much like a real development project in a two hour time frame.

What really interested me wasn’t the exercise, but knowing some of the players of the game, was that they acted in the game like they acted on real projects.  If they were bad project managers in real life, they tended to be bad project managers in the game.  And the teams with the weakest managers got the least far in the game.  In fact, I had a similar experience years ago when we played the same game as a team building exercise.  The individual who struggled the most at work also struggled the most with the game.

In the end, at the report out that always gets done, the same people who struggle with the game struggle to connect it to their real lives.  They don’t see the problem, or dismiss the game as silly and unrealistic, when in fact the reason they struggled with the game is the same reason they struggle in their jobs.

It’s an unfortunate conundrum – those who would benefit most from learning from the game cannot learn from it.  That doesn’t mean these simulations aren’t useful, they’re just useful in a different manner.  As the leader of the organization, what you can learn from watching your employees play games like this is much faster and far less expensive than figuring that out in real life.

Just like the basketball players, those who can do something that isn’t the actual task, but can extend it to the actual task stand to benefit.  Though building Legos isn’t running a project, as a manager you can learn a lot about how your projects are going to go.  Make the most of simulations – those who can learn from the game will transfer knowledge into their daily habits, those who can’t learn from the game will show you that so you can learn about the players.

Independence matters

Filed under: Measurement,Quality — Adam Schwartz - February 16, 2012

A few years ago, my wife, daughter and I were out to dinner. My daughter was about three at the time. My daughter, who has always been a daddy’s girl looked up at me mid-meal and said “sit with you?”. To be fair, my daughter has me wrapped around her finger, but I said “not until you are all done.”

So, she looked down at her pizza and declared “I all done.” Smart girl, I thought. Since I had allowed her control of the definition of done, she could simply declare herself successful and get what she wanted. So, I clarified, “not until you finish your pizza.”

“Done” wasn’t something that I could assess really. After all, who was I to say whether she was satiated or not. All I could see was that there was pizza on the plate. What does this have to do with software, you might ask.

Well, the critical difference between “done” and “finish your pizza” is who was the one with the ability to assess success. In software measurement, many measurements have a denominator that serves to represent the amount of work. You might use function points, lines of code, story points, etc. But, the distinction you should be making in choosing your metric is “is this measure independent from the people doing the work?”

Lines of code and story points are not. Function points are. If you don’t establish the ability to independently assess the amount of work, then you are putting the definition of success entirely in the hands of the developers. That’s not to say that developers want to do a bad job, but when push comes to shove, it’s easier to fudge a metric you control than actually make a change for the better.

Process change demands communication

Filed under: Change Management — Adam Schwartz - January 23, 2012

I don’t travel by air all that often – still, it’s often enough that I belong to the rewards program at one of the major car rental companies. Part of their features is that your car will be ready and waiting for you when you arrive. This is very convenient, since it can mean the difference between standing in line in the office for twenty minutes, or being twenty minutes into your drive.

So, when I landed the other day, I was ready to hop in my rental and drive off. I got into my car and immediately reached for the hang tag which is usually dangling from the rear view mirror. You need the contents – namely a contract – in order to get off the lot. At least, up until this visit you did. I assumed they forgot it, so I got out of the car and walked over to the office to get it. The office was quite busy, so I stood there about ten minutes before it was my turn.

Then, I said “there’s no hang tag in my car” to the attendant who said “oh, we don’t do that anymore. We just print the contract at the exit.”.

“Oh, ok, ” I replied. I was somewhat annoyed that despite them having my email address and phone number that they had not bothered to tell me this and that I stood in line for ten minutes waiting to find out there was nothing wrong. But what really surprised me was that three other people, who were in line behind me heard my conversation with the attendant and exclaimed “oh!” as well.

Now, I’m not actually sure this process change was any improvement. At the exit, the printer was inside the little guard house, while the guard was standing outside. Each time he’d have to walk back in, get the contract that just printed and come back out. The line was longer to get out than it usually is, mostly because of the motion and waiting waste this change created. As a customer, not a good experience.

But, first and foremost, that little bit of waiting at the exit I did is nothing compared to the then minutes I waited just to find out that the process had been changed. If this was the one and only time that the process will ever change, it wouldn’t be so bad… But, of course, it won’t be. Communicating the change is important as well. Don’t overlook its value.

Older Posts »