A zero bug policy - how to get there? (2/n)

In the previous post, I discussed the reasons for a a zero bug policy using a real example policy. I used a similar approach in teams developing mobile applications, embedded software for public transport, or administrative tools for a utility provider.

This post adds the journey to get to a zero bug policy. Introducing a zero bug policy is not always easy, and the start might be a bumpy road.

Quality is never an accident. It is always the result of intelligent effort. - John Ruskin

Phase 1: Creating an appetite

The first phase in introducing something like this is creating the appetite for change. You cannot change people by force. You can only invite them to walk with you to experiment and learn.

Below I go through some challenges and what you can try in such cases to create the appetite.

Slow down product development

The cost of quality is the expense of doing things wrong. - Phil Crosby

Managers or product owners are often scared that fixing all issues will slow down product development and prefer focussing on new features (something they understand better). Concentrating on fixing the bugs might temporarily slow feature output, but the team probably needs to slow down. It does not make sense for your customer to get new features if the existing ones fail to work. And we know that problems lead to more problems, which slows you down in the long run.

Try to visualize the impact for the customers and the amount of time and energy the team has to spend on fixing issues that could be better spent on new features.

Some examples can be to look at customer feedback in the store and the ratings they give (for mobile apps), collect stats on crashes during a sales process, and keep a simple log of time spent fixing problems.

Understand the problem

Team members often do not consciously realize the sheer amount of problems present in their software. They are focused on adding features and solving problems ASAP and lose the overview.

Try visualizing the problems and encourage reflecting on what is going on.

Some example techniques are to put all the bugs on top of the scrum board, report about the number of bugs discovered and fixed in your reviews, and hold retrospectives about which bugs you had in the last quarter and how you could prevent them.

See a way out: make time for structural improvements and learning

Once people realize the number of problems, they do not see how they could get out of the mess. Once you are in firefighting mode, the bugs will probably keep on coming. People need the perspective of improvement before they embark on unsure experiments.

Try to inspire people that it is possible, through discipline, to improving technical quality by improving the design, the testing, and by shortening the release cycle. This means they must create time for improvements in all these areas.

Quality comes not from inspection, but from improvement of the production process. - W. Edwards Deming

Some example techniques to apply here:

  • Consistently apply the “boy scout rule”, namely always leave any place you touch cleaner than when you found it.
  • Regularly analyze what causes the bugs. A retrospective where you explore fixed bugs and how to prevent them can be very useful for this. I intend to write a blog post about this, but what we do was inspired by root cause analysis retrospectives
  • Reserve between 10 and 30% of the time for improvements proposed by the team. Structural modifications are often needed to prevent bugs. Otherwise, they will keep on coming. Structural improvement should happen continuously (e.g., one or two stories every sprint).
  • Consider (small) rewrites. Rewriting your whole application from scratch has many disadvantages (e.g., Joel Spolsky on “Things you should never do”). But isolating a piece of functionality (refactoring), covering it with tests, and rewriting it can be an excellent strategy to quickly solve many bugs in a problematic area. Keep the pieces that you rewrite small enough so you can get them in production ASAP.
  • Improve your testing and ensure they provide fast feedback to your users. If too many bugs manage to escape testing and get into production, it probably means you are not testing correctly. Look into the test pyramid and make sure you have the appropriate tests. But also, get creative and think out of the box. Some examples beyond regular testing that made a massive difference for us are: building a test robot for automatically testing the handling of smart cards, build a smart card simulator to test smartcards a lot faster, running tests on vast amounts of device types in labs, random input values tests.
  • Automate releases to production. If it is easy to release, you will reduce the feedback cycle of fixing bugs.
  • Spend time on learning. Do katas to discover XP practices like TDD, better design, and architecture. Organize knowledge shares where people can show what they learned.

But I want to work on new features, not these stupid bugs!

The statement above is often-heard. It is actually the most heard objection from team members against a zero bug policy.

When you dig a bit deeper into the concerns, it is not that these people do not want to have less bugs but mostly feel they are not in control of changing this. Examples of this are:

  • In their point of view, the bugs are not caused by them (personally). The cause might be someone else, another team, or the system that never prioritizes improvements in general. Why should they do the hard work of cleaning up someone else’s mess?
  • Improvement ideas are hardly discussed and always deprioritized.
  • Developers are judged mainly on their output of features.
  • Quality is a concern that is only considered at the end, typically by testers.
  • There is a hero culture.

Turning this around asks for changes to the way of working and priorities such as:

  • Make clear and regularly repeat (also from higher in the hierarchy) that quality is a crucial concern and more critical than new features.
  • Listen to improvement suggestions and make small structural improvements every week.
  • Give more responsibility to the team to select their improvements. People who receive responsibility feel responsible.
  • Move away from individual responsibility. The whole team is responsible for anything that someone on the team produces.

Often, the people who are the most critical voices at the start become your biggest advocates later.

Phase 2: Clean-up the legacy of bugs (if needed)

A recurring concern when working on a zero bug policy is the sheer amount of open bugs. I ended up in a team with thousands of open bugs, with the remark we would not be able to develop anything new for the next year if we used a zero bug policy.

This situation sounds bleak, but it is the perfect opportunity for a zero bug policy! The key is to realize you will have to clean up and alter the policy slightly to talk about new bugs. Below we describe how we approached the situation.

  1. We removed all bugs not updated in the last four months. We reasoned that if they are not updated recently, they are probably not important or relevant anymore. Removing old bugs was supported by the management and team once everyone understood how much time it would take to verify them. It was cheaper to lose a bit of time to rediscover a few relevant ones we would close.
  2. We kept one person per sprint out of the planned capacity, the bug fixer, and this role rotated every sprint. The bug fixer did intakes for all new bugs immediately (starting with a zero new bug policy) and spent the rest of his time fixing up the bug backlog of bugs that changed the last four months.

After a few months, the bug backlog was gone. Bugs became a lot less common, and we could drop the bug-fixer role.

Phase 3: Do, or Do Not, There is No Try

Introducing the Policy

An essential part of introducing this policy is for everyone to give their consent to introducing it. At least the policy should be written down and be known to everyone.

Things you can try:

  • Using consent decision-making to get the policy accepted. I often use a (lightweight) form of consent decision making to collect concerns and objections (https://thedecider.app/consent-decision-making).
  • Post the policy around the working area, i.e., on posters, wikis, etc.
  • Refer back to the policy when discussing bugs during a standup. Especially in the beginning, people need reminding.
  • Lead by example and pair to fix any bug that appears. I often do this for weeks until the habit is ingrained.
  • Ensure the policy reflects reality. It is better to have an imperfect policy that everyone respects and follows than a perfect one that everyone ignores. If you read the previous post, you might have noticed some concessions to make it work in reality.

And then it is about maintaining discipline. The team needs to make time for improvements, fix the bugs fast, and prioritizing quality over new delivery.

How long will it take to reach “the good place”?

Improve quality, you automatically improve productivity. - W. Edwards Deming

How long it takes before you reach a stable point depends on the size of the codebase, the code structure, and the number of team members.

  • For small teams with a relatively fresh codebase, it can be a few weeks.
  • More realistically, it will take you 3 to 6 months to make a dent in the number of issues and implement structural changes to prevent the bugs.
  • For a multi-team setting with a large and old codebase, it might even take more than a year or require rewriting parts of the codebase to reach this point.
  • The more customers you have and the more types of environments (like browsers or mobile devices) you support, the harder it becomes to reach/keep zero bugs. But this should not keep you from trying to reach that point. In such cases, it can help to provide guidelines when an issue is up for fixing. We used a guideline that if an issue appears less than X times in the last two weeks, for 500k sessions a day, we do not yet start to investigate. The goal is to keep on lowering the X until you reach a satisfactory point.

Is this a silver bullet?

Using a zero bug policy is not a silver bullet. If this is the only change you make, it will most probably not work at all.

That is why applying good engineering practices and focussing on creating a learning organization are essential. I added a few suggestions already in the section See a way out: make time for structural improvements and learning

Summary

Introducing a zero-bug policy is a bumpy road. To make it successful, you need to:

  • create an appetite
  • clean-up to start with a clean slate
  • make the policy explicit and maintain the discipline
  • keep on searching for structural improvements to prevent bugs

Let me finish with another quote:

The secret of getting ahead is getting started - Mark Twain

Many thanks to Matteo Pierro, Saket Kulkarni and Stéphane Genicot for the help on improving this article.


See also