Measuring Risk

Measuring Risk

A while back, I was in a planning meeting where a whole development organization was trying to squeeze a huge and non-negotiable volume of work into a predetermined and non-negotiable release schedule. It was basically the antithesis of everything I ever advocate for. I was mostly trying not to just yell at people. It wasn't those people's fault, and it wouldn't have helped anyway. And then one of the QA leads said something that caught my attention: they would do risk-based testing.

Everyone just nodded and agreed. But I had to wonder, how would you actually establish those risk levels? I know for a fact that what actually happened was the leads made some mostly arbitrary assessment of testing priorities based on their experience. What should have happened was a formal analysis of the release to measure risk in way that produces actual numbers. But then, what would that analysis consist of? How do you even measure risk?

Risk

Lucky me, the concept of risk is actually rigorously understood in economic terms. And in the end all of these decisions are economic ones, so that works for us. In economics, risk has two components: Exposure and Uncertainty. I'm sure there are clever ways to manipulate them to get very nuanced results, but that's a lot of work. All I need is to be able to rank one feature vs another to determine how to allocate my limited resources. So for my purpose I'll use a simple formula: Risk = Exposure × Uncertainty. Great! But, how to quantify those things?

Quantifying Uncertainty

If you're going to do algebra on uncertainty, it needs to be a number. To get that number you first have to find all of the sources that can contribute to it. Uncertainty is a lack of confidence. You gain confidence by having repeated experiences from which you learn and improve. Which is to say through practice. You can also gain confidence by making conscious assessments of your work or progress. AKA, through testing. Now, it's not practical (and likely not possible) to account for every source of uncertainty. But there are many that you can account for. Here is a non-exhaustive list to get you started:

  • Scope of changes to a module
  • Scope of changes to a module's dependencies
  • Inverse test coverage (more coverage == less uncertainty)
  • Historical defect density
  • Unmanaged or absent elements of the review/build/config/deploy process
  • Scope of unplanned work (work that was added mid-cycle)

Except for review/build/etc process, you may notice these are all about scope of work or change. That's not a coincidence. Your immediate takeaway should be that doing big releases increases uncertainty. It's also convenient because it means these are all the same unit. That unit is whatever you use to estimate work. For the sake of simplicity, let's say it's story points. But it could be work days, or developer hours, or a variety of other things. And so you can just add them all up. Except for parts of your process that you just skip; that's really bad, so I say double the number for everything you know you're not doing. Test coverage is similar: if your 40 point feature has 0% test coverage, then you actually have 80 points of uncertainty.

There are of course other sources of uncertainty that are hard to measure and/or account for. They also tend to be things you don't actually know about. My best advice here is to be aware there might be things you don't know, deal with them as they arise, and account for them in the future.

Quantifying Exposure

This one is easier. Or at least it should be. Exposure is what you could lose if something goes wrong. Since you probably work for a profit-motivated and budget-constrained business, that means money. And that means the unit of risk is also money. Of course our formula is set up to let us compare the magnitude of one risk vs another, not to actually analyze potential real losses. But, it's still attention grabbing, and it should make things easy for executives to understand. So again, here is an incomplete list of sources of exposure. Your product owner should be able to get you these numbers, or at least tell you who can get those numbers.

  • Costs of missing SLAs
  • Costs of additional support
  • Costs of bad press
  • Legal expenses and penalties
  • Lost customers or sales
  • Time until a fix can be delivered

Again, there is that one factor that isn't the same kind as the others. Everything in the list is money, except for time to a fix. I would again propose that this should be treated as a multiplier. And again, I think the immediate conclusion should be obvious: more frequent releases reduces risk. Combined with what we learned from measuring uncertainty, it seems blatantly obvious that small and frequent releases are a very effective way to mitigate risk in software development.

Finally

So there it is. Risk = Exposure × Uncertainty. You've got methodology for determining both components. Go forth and prioritize based on risk. A formal and well defined risk. And then work to minimize your risks. As a developer, you probably can only affect the amount of uncertianty that's present. So write more tests. Automate more things. Use these numbers to show why it's valuable to do that, and why you should be given the time and resources to make it happen. While you're at it, you can also use these numbers to convince your stake holders that working and releasing in smaller increments is safer.


Cover photo by unsplash-logoYeshi Kangrang