A moment of shameless self promotion

I’m pleased to say that a fun package arrived in the mail today.  It was the conference proceedings from ITNG 2012 (IT Next Generations 2012) 9th Annual Conference.  Generally, conferences are good stuff, since you get to meet other people and be exposed to new ideas, but this one was particularly special for me.  It represents the first conference where I’ve had the pleasure of publishing my own work.

One would think, working in industry, that if I’d discovered a new/great idea that the route ought to be to patent it and protect my intellectual property.  I don’t see things that way.  As you can tell, I’m inclined to freely give away lots of knowledge that I have because I know, when you need specific help with your situation, that I’ve been an open book about exactly what kind and quality of advice and guidance you’ll get from me.  And for that reason, I’m pleased to be able to say that we’re breaking new ground with looking at how to measure the size of a software system.

Sure, there’s always KLOC (thousand lines of code) or FP (function points) but both these systems have major drawbacks.  Capers Jones has gone so far (and I agree) to call the use of KLOC as a measurement system “professional malpractice.”  Function points makes lots of improvements, but can be costly.  There appears to be a simpler answer – simply use the test cases as a proxy for the function points delivered and use those as your measurement system.  If you’re so inclined to read further, check out pages 242 – 246 of ITNG 2012.  I hope some new ideas in the software measurement space can help improve the science.

Independence matters

A few years ago, my wife, daughter and I were out to dinner. My daughter was about three at the time. My daughter, who has always been a daddy’s girl looked up at me mid-meal and said “sit with you?”. To be fair, my daughter has me wrapped around her finger, but I said “not until you are all done.”

So, she looked down at her pizza and declared “I all done.” Smart girl, I thought. Since I had allowed her control of the definition of done, she could simply declare herself successful and get what she wanted. So, I clarified, “not until you finish your pizza.”

“Done” wasn’t something that I could assess really. After all, who was I to say whether she was satiated or not. All I could see was that there was pizza on the plate. What does this have to do with software, you might ask.

Well, the critical difference between “done” and “finish your pizza” is who was the one with the ability to assess success. In software measurement, many measurements have a denominator that serves to represent the amount of work. You might use function points, lines of code, story points, etc. But, the distinction you should be making in choosing your metric is “is this measure independent from the people doing the work?”

Lines of code and story points are not. Function points are. If you don’t establish the ability to independently assess the amount of work, then you are putting the definition of success entirely in the hands of the developers. That’s not to say that developers want to do a bad job, but when push comes to shove, it’s easier to fudge a metric you control than actually make a change for the better.

Never teach a man to fish

The old saying goes “give a man a fish and he eats for a day. Teach a man to fish , and he eats for a lifetime.” Or something like that…

Sometimes, it’s not quite so in software. The idea of making someone self sufficient can create more problems than it solves sometimes. If someone comes asking you for data, it seems like it would be easier to teach them how to write and run a SQL query, right? I mean, after all, if they can fish they can learn how to extract new data elements, do interesting joins, and perhaps even discover something you didn’t think of.

But what happens when the lake dries up… Er, I mean, what happens when someone moves the data? Now, instead of having to just repoint an ODBC connection or two, you get emails from dozens of unhappy people who want to know where all the fish have gone. And they don’t just want you to send them data, they now have dozens or hundreds of their own poorly written queries (because they’re amateur coders) built on the way the data used to be.

The problem with teaching someone to fish is, you really can’t just teach him to fish. You have to teach him how to repair his fishing pole when it breaks, how to find bait, how to scout fishing locations… Things that are related to, but not exactly fishing, if you want a truly self sufficient person. Otherwise, what you have is a problem waiting to happen, one which may be more dire. Certainly one that’s more annoying to fix.

So, when someone asks for data, you should at least ponder for a minute if they’re really equipped to be a fisherman or if you should just hand over the fish.

Don’t destroy your data

In technology terms, the world changes quickly.  New hardware, software, and tools appear all the time.  And as companies, we want to take advantage of these new things to hopefully gain a competitive edge.  The problem is, we often don’t know if the promises offered will ever be delivered.

A lack of data in the first place often gets us.  We find we’re immature as a scientific practice gathering data about what works, what doesn’t work, how effective things are, etc.  But even when we have data, for some reason we often choose to destroy it.  And with its destruction goes our ability to examine whether future changes made a difference.

For example, one type of tool commonly used by organizations is some sort of incident management/problem management system.  Something like Serena Teamtrack, MKS Integrity Manager, HP’s ServiceCenter, and others… it’s one of the most valuable tools in terms of collecting data we can use to improve future project performance.  But these vendors are always offering upgrades, and they’re sometimes significant.

More than one organization I’ve worked with sees the upgrade coming, plans out the upgrade and as part of that process decides that they won’t be moving over all the old/closed incident data.  Rather than re-import it, they just throw it away.  And with it, they throw away an enormous amount of data about how the organization functions.  Suddenly, a “simple” software upgrade gets you back to square one…

Not only do you not have data at all, but you need to wait a year or more to get enough data to understand things like seasonal trends, long term impacts of process decisions, etc.  And while you’re waiting to build up that history again, the vendor is going to release another major feature set!  And you’re going to upgrade again!  And you’re going to throw your data away… again!

Don’t destroy your data.  The technology world at large may move quickly, but discarding your entire history means you can never learn from it.

Brewster’s Millions

The other day, I was talking to a friend in a finance organization and they were telling me about the millions of dollars their business had set aside for software development.  That’s when “Brewster’s Millions” came to mind.  In 1985, a movie came out with a ridiculous concept (although most Hollywood movies meet this criteria) – in order to inherit $300 million, Brewster must waste $30 million.

Sure, how much you have to spend matters, but spending more doesn’t necessarily mean getting more.  Just as Brewster could waste $30 million, software development can waste a lot of money as well.  Indeed, if the only goal is to spend the cash, one can be incredibly wasteful – people sitting around twiddling their thumbs, taking long lunches and whatnot because they are way under-utilitized but happily billing every hour.

For this reason, although watching your budget vs. actuals is important, it shouldn’t be done in a vacuum.  In addition, you should be looking at what progress is being made.  In Agile, you can look at story points completed per dollar spent.  In a plan driven project, you can look at function points (FP) per dollar spent.  FP can be used to measure progress on analysis, design, coding and testing progress, so they’re useful throughout the lifecycle.  One could argue that only working software has value, but I would counter that since the act of translating requirements into functionality is the activity we truly want, the process of thinking through the design does have value.

Either way, we need to measure the earned value to understand where we really are.  Otherwise, our project could simply be the sequel to Brewster’s Millions.

Velocity and standard work

LEAN has a concept called standard work times which is an expectation of how long some task should take.  For example, it should take you 1 minute to mill part X from a block of steel.  Having expectations of how long something should take is important in manufacturing, because it helps you assess whether an individual or team is meeting those goals.  The measure for standard work shouldn’t come from some desire for a certain output, it should be based on how long those things really take.

In software, lots of measures of standard work have been generated over time, but we don’t seem to use them much.  I recall, from my earliest days of programming, that a programmer was expected to create about 5000 lines of code per year.  I’ve also heard estimates for debugged lines of code per day (10-25 or so, if I recall correctly).  Capers Jones’ expressed measures of productivity in FP delivered per day.  We seem to have some understanding of what a typical productivity expectation is.

And then we ignore it.  For some crazy reason we decide that the “laws of physics” don’t apply to us and we need a measurement that is unique just to us.  In Agile, they call it velocity.  It’s a measure of the team’s productivity based on story points per day.  What’s a story point?  Well, that’s a relative sizing mechanism the team determined.

The problems with velocity are numerous:

  1. It’s only good for the exact team you’re currently on.  Sadly, team members come and go.  As anyone can tell you, people are not protect-able resources.  Even if you don’t want them to quit, they will.  And when they do quit, your measure of productivity walks out the door with them.  The new person will need ramp up time.  The new person may be more or less productive.  The new person might not gel with the team.  And until then, your expectations of productivity are gone.
  2. It’s only good within the team.  You can’t compare cross team productivity if everyone’s got their own way to measure it.  I’m positive that I’ve argued before that comparing yourself to other companies isn’t necessarily a worthwhile activity, but being able to compare yourself to yourself is.  How do you know you’re getting better (or worse) if the ruler keeps changing size?
  3. It can accept lower productivity.  Sure, even if you accept #2 above, that you don’t want to compare yourself to your competition, you have no gauge as to whether better is even achievable.  Software is a relatively slow process, so you don’t get lots of data points out of it.  If you have to wait for your team to build up a measure of productivity, you could be accepting a lot lower productivity than is likely achievable.  Being able to compare yourself to a typical expectation of productivity provides a measurement it will take far longer for you to build up on your own.

I understand that there are people out there who believe software is more of an art than a science; that software will be practiced by small teams doing great things.  I happen not to agree with that.  It may be true some of the time, but certainly not all or even most of the time.  In most cases, software is more likely to be like building houses – unique but largely assembled from similar pieces.  And with that in mind, software is more predictable than we like to admit.  But predictability means you can have expectations of standard work times, and that you needn’t allow each team to define productivity in their own vision.

The downfall of Pluto

To quote from a Christine Lavin song I was listening to the other day:

In Arizona at the turn of this century
astromathematician Percival Lowell
was searching for what he called "Planet X"
because he knew deep in his soul
that an unseen gravitational presence
meant a new planet spinning in the air
joining the other eight already known
circling our sun up there

Now, if you’ve paid attention to the outcome of Christine’s silly song, you know that Pluto is no longer considered a planet.  The first verse of her song (along with other bits elsewhere) get at an important point.  It really doesn’t matter what we call Pluto – planet, dwarf planet, etc.  That’s our naming system.  The universe really doesn’t give a darn.  Pluto has mass, exerts gravity, revolves around the sun… those things are facts.  Whether it’s a planet or not, that’s subject to the vagaries of operational definitions.

Why bring it up?  What’s it got to do with software development?  Well, what’s the difference between a defect and an enhancement?  In either case we know that the user wants something changed.  They don’t like how the system currently works.  Those are the facts.  Some things seem clearly like bugs, like when the system crashes, other things seem clearly like enhancements, like the user wants a new web page added.  But there are things in the middle, lots and lots of things, that most people could make a reasonable argument for either case – bug or enhancement.

Frankly, it doesn’t matter.  What’s important are the facts – that the system does something and the user doesn’t like how it does it.  Therefore, it needs to be changed.  Now, one might argue about whether it’s economically worthwhile to change it, or that there are other things of higher priority to do, but to lob whatever it is back and forth between bug and enhancement is not a worthwhile activity.

Sometimes, we spend far too much time trying to argue that it’s an enhancement in order to avoid the shame of having created a bug, but to be fair, the user still doesn’t like it.  Call it what you will, the result is no good.  Instead of focusing on what it’s called, learn why you made the choices you made that got you there and figure out if there’s a pattern of behavior which could be changed to avoid the same type of mistake again.  That’s a far better use of your time.

Pluto, for whatever we call it, will still be spinning out there, doing its thing, long after we’re gone and long after anyone’s around to call it a planet or a dwarf planet.  In the end, it won’t matter.

Testing is not inversely effective

It’s time we cleared something up about testing.  It is a misconception that I fear is more commonly held than it should be, and it usually begins with “finding defects in testing is good, because we won’t find them in production.”  That statement is true.  However, it implies something that is not true, which is that you can test in quality.

Here’s the thing, testing is about 35-50% effective per test type (unit, functional, performance, etc.).  If the code is good, testing is 35-50% effective.  If the code is bad, testing is STILL only 35-50% effective.  That means, that if you find a lot of defects in test, there are even more to find that will make it to production.

So, for example, let’s say you had two teams code the same functionality and they both delivered code to you.  You run the same set of tests against the code and find that team A had 100 bugs and team B found 10 bugs.  The teams fix all the bugs you found and you retest to make sure that none of the tests you had are subsequently broken.  Is it fair to say that team A’s code and team B’s code is now effectively equivalent? 

I’ll give you a hint: no.  Given that you had a fixed set of tests to run, and didn’t adjust that due to the quality of team A’s code, your testing does nothing to fix the code you didn’t adequately exercise.  In a real project, the same is true.  If you write a set of test cases in advance of receiving the code, and the code quality is poor, unless you devise additional tests to increase coverage, one can assume the un-exercised code is of poor quality as well.  Thus, you will let more defects into production.

Testing is not, let me repeat, NOT inversely effective to the quality of the code.  You don’t suddenly get magical results from testing just because you delivered bad code to testing.  You get the same percentage results from testing, and let more bad code through.  This is why you cannot test quality into the system.

Half will be below average

The funny thing about averages is that, in order to have them, some stuff must be above average, and some stuff must be below average.  Assuming a normal distribution, 50% of everything is below average.  That’s just the way it is.  We tend to be offended by being “below average,” but something has to be, or else you can’t have an average.

In the world of software quality assurance that means there’s an unfortunate truth.  Half the time, you have to inform your project team that the quality you are seeing is below average.  Guaranteed.  That’s what average means after all – it’s the central tendency of the data and it represents the MIDDLE of the data.  One half will be better and one half will be worse.

In the pursuit of perfection, half the time you are going to have to tell people to do better.  If you don’t, your competition likely will, and then even your best 50% of projects won’t be good enough.  We don’t tend to think that’s what average means.  Instead, we equate average to mean not good, or at best so-so, like “an average dinner out.”  When we say that, we mean we didn’t like it that much.  But, when it comes to software (and dinner), average is the reality.  You can’t avoid it.

So, yes, we may not be happy with being on a below average project, but no matter how good everything gets, that distribution still exists and there is still opportunity for improvement until everything is perfect.  And why shouldn’t you pursue perfection in all aspects – cost, time and quality?  It does mean you’ll always, always be telling some team that they’re below average, but that’s just the way things are.

First Pass Yield in Testing

In the industry, a typical measure of quality is Defect Density, which is simply the number of defects divided by the units of work delivered (function points, LOC, or even test cases.) The other day, someone proposed that instead of (or at least in conjunction with) defect density, that organizations ought to measure first pass yield in testing. After a bit of discussion and thought about it, I don’t think this is a good measure.

  1. First pass yield (FPY) ignores vicious rework loops when test cases fail. A test that doesn’t pass on the first run falls into the other bucket, but no matter how many subsequent failed attempts and new bugs are created off that test case, it all is lost. FPY won’t measure that.
  2. FPY is redundant to defect density. Even if FPY didn’t have shortcomings, why measure the same thing twice?
  3. FPY doesn’t measure collateral damage. In software, it’s quite easy to fix one thing and then break another piece of functionality. So, a test that passed on round one of testing may fail on round two. FPY would say this test is good, when in fact, it is rendered bad by a related activity.
  4. FPY encourages the wrong behavior – to run test cases that you know will pass rather than to run high value test cases. If I were encouraged to get a high first pass yield, as a developer I would be guiding the test team as to what to test and when so that they never tested anything I thought was suspect. In the same vein, if QA was pushed (and it would be horrific if they were) to drive up FPY, they’d write worse test cases. The goal isn’t to have test cases pass, the goal is to fully appraise the system, and if it is of poor quality to report that.

No, I think we should stick with defect density and forgo FPY when it comes to testing.