Good at heart attacks, bad at cancer

I was watching a video today from IBM that included insights from many world leaders, famous figures, and so on. A lot of times someone passes a long a video and I like it. Most of the time it doesn’t resonate for me, and sometimes there’s a single quote that sticks with you.

In this case, Fareed Zakaria said “we are very good at heart attacks. We are very bad at cancer.” He wasn’t referring to the medicine, however. He was talking about companies. His point was that companies react well to sudden traumatic events and very, very poorly to things that eat away at us slowly over time. It resonated with me not because I didn’t know the concept, but because it so elegantly states the difference between sporadic and chronic loss.

Sporadic loss is a heart attack. In software it’s the production outage. It’s all hands on deck. Everyone comes together for a few minutes or hours and rescues the system. Then we go back to the projects we were working on – crisis averted.

Chronic loss is a cancer. It starts developing and you don’t even notice it. By the time cancer has become a lump you can feel, it’s frequently too late to do anything about it. The prognosis for many late stage cancers is not good. It is very similar for chronic loss in organizations. At first, it’s a defect which generates a bunch of calls to the call center, but there’s a workaround and you deem it too expensive a fix to cost justify. So, you live with the workaround. And then there’s another, and another, and another. Over time, you allow hundreds or thousands of small failures to erode the quality of your product. Individually, none of them is an issue. Collectively, you have a mountain of a problem and a few hours of heroism won’t help you. The problem got there over years of deferred maintenance and it isn’t going to go away easily.

Chronic loss looks like a bloated production support organization. Chronic loss is when you have so many production incidents you can’t even fathom taking the time to attribute them to the projects that introduced the issues. Chronic loss is when you only talk about critical, high and medium severity defects because there are so many low defects they drown out the others. Chronic loss is when you justify that measurement system as ‘focusing on the big issues’ – the heart attacks are all you look at. Chronic loss is when you pay someone to look over the shoulder of someone else doing the work to make sure it’s done right, rather than figuring out how to error proof it. Chronic loss is batch abends that you just restart every month, or week, or night, or several times a day and never figure out why it failed.

Being good at heart attacks isn’t going to save you from cancer. But preventative care of your software will protect you against both risks.

Leave a Reply

Your email address will not be published. Required fields are marked *