Total Productive Maintenance is a piece of the LEAN arsenal designed to keep your machinery up and running. As I was taught it, when a machine breaks down we notice it. This is sporadic loss.
However, when a machine is “quirky” and requires lots of little adjustments to keep running, has minor but annoying down time, can’t be run at full speed, etc. it’s chronic loss. Ignored, chronic loss is something that simply becomes part of the background noise of manufacturing. We just learn to live with the lack of availability in little bits and pieces.
Software can be very much the same way. If you find yourself with a large support staff but you can’t put your finger on exactly what they’re doing, maybe it’s chronic loss. Are you fielding the same phone call over and over and over? Are you restarting a batch job every time it fails because you know if you just try it again that it’ll work? These are the chronic losses of a software environment.
And these losses become all to easy to live with. We measure our support folks in terms of first call resolution, which if you’re going to have an issue is nice, but we never talk about how not to receive the phone call in the first place. Support people like to be employed, so where’s the impetus for them to make fixes permanent. The entire job of support thrives on the failure of development to make a truly robust system that “just works.”
The clues that you put too much focus on sporadic loss are easy to see:
- You rate your production incidents by severity and then only discuss the Critical or High severity incidents. The medium and low incidents, despite their quantity, get little to no attention.
- Any of your definitions of quality metrics (if you have them) puts additional focus on higher severity incidents. Yes, a complete software outage for 10 minutes does affect all your customers, but what about that 30 second annoyance that happens to every customer several times a day. Which one actually adds up to more time lost?
- Anyone has ever said “we have too many low severity incidents to classify them all.” It’s likely these incidents are the same repeat offenders over and over again.
- You have an inordinately large support staff compared to the size of your development team.
- Your support staff has written scripts for how to deal with the issues they get calls for. If it happens so often that you can write a script for it, you can fix the defect that causes it.
- You’ve ever closed a defect as “routine resolution.” Defects should not be “routine.”
I think you get the idea. Sporadic losses may be highly visible and painful, but chronic loss drags on your organization forever and builds upon itself over time.