Principles for designing process

Why we create more problems by trying to solve one

Mar 27, 2024

Recently, I went for blood donation for the first time. The process of becoming a donor is a bit complicated. During the first intake, you fill in a health questionnaire, the doctor checks it. Then they measure blood pressure, sugar level, and some other stuff. After it, they take blood samples to test for blood-borne disease.

Then you wait to hear back, and if all good, you can make another appointment for the actual donation.

I thought that the intake process is overcomplicated, and many people drop from the process.

While waiting, I was chatting with one of the workers there (they give free coffee and cookies):

— Me: "I wonder, how many people who want to donate actually schedule a second appointment for the donation?"

— Office worker: "Indeed, this is an issue. Many people drop out after the first step. Though we send emails, and even call them, sometime people never get back."

— Me: "How often the blood tests you take reveal that person cannot be a donor?"

— Office worker: "It's rare, less than one percent."

— Me: "Why don't you take blood on the first appointment and throw it away if tests are negative?"

She looked at me as if talking to an idiot, then went on: "We prioritize the safety of donors and people who will receive their blood. There are many ways we can get more people on board, but we cannot compromise safety."

Well, I was naive. In my job it's a reasonable trade-off: going faster, accepting a low risk of some work being thrown away. Apparently, it's not the case for the blood bank.

Nevertheless, this story reminded me about the reasons we design process and policies, and how important it is to have a clear goal behind them.

In this text, I use the words "process" and "policy" interchangeably. They are two different things. However, for this piece the difference is not significant and explaining it here would blur the focus.

Companies in early stages often create processes because they need a way of doing things. We hired 5 engineers from friends and former coworkers. Now we need to hire 50. We need to define the way we hire people.

The company used to give credit cards to employees to buy stuff they require, that's not possible with the current headcount. We need a way to handle procurement and reimburse expenses.

These early decisions, which originated from the need of wanting to control chaos, are assumed to be temporary. However, most of them tend to stick and become limiting factors as the company grows.

Teams can also make ad hoc agreements, driven by the same need for predictable ways of doing things. System X is difficult to deploy, so we'll deploy it once a week on Tuesday. The on-call person on that day will handle the rollout.

Have you seen something like that? Sometimes, we invent a policy because it's the simplest (might not be the best) way to address the issue.

This leads to the first principle of process design.

Assess if a policy is the right solution

Let's consider the example of difficult rollouts. Let's assume we made a policy for an on-call person to handle rollouts once a week. Now, if we fast-forward one year, will the situation be better or worse? Three years from now? Probably, there will be fewer people who know the system well. This process is unlikely to improve the team's satisfaction or make on-call less stressful.

Such a decision solves the problem now, but it creates more issues of bigger complexity down the road. It might be a reasonable approach for a limited period of time. However, the situation won't be solved without an action plan for improving things and commitment to execute the plan. For example, extract parts that change often into a separate services, invest in automation, and build reliable deployment pipelines.

It might be dangerous to invent a policy that will defend the status quo when a transformation is required. In my example, the policy about rollouts establishes painful rollouts as given and encourages building workarounds when the root cause stays unaddressed. It takes agency and accountability away from the team and creates an environment for learned helplessness1—These rollouts are painful, nothing can be done about it, it's not even worth trying to improve.

In certain situations, a solid change management plan is what needed instead of a policy. Looking at the long-term effects of the decision at hand might help to understand the difference.

Have a way to measure the efficiency of the process

Language is extremely flexible, the same things can be interpreted as "good" or "bad" depending on the words we use.

They are stubborn, you're persistent. They give up too quickly, and you're flexible on priorities. Your team is working hard, the other team delivers slowly.

Words suggest interpretations, but they might obscure facts. It's crucial to look at numbers. Numbers give a reference point that you can compare with.

Once I joined a new team and in my first weeks I noticed tension between product managers and engineers. PMs were claiming that everything took forever to deliver, while engineers claimed they worked as hard as they could. It was difficult to tell who was right, probably both. Every time I hear labels such "hard" or "slow" I ask myself, "Comparing to what?"

I spent a few days collecting data on how long it took to get changes in the code live. The numbers were bad—it took 18 working days from an engineer starting to work on something, to the moment their code runs in production. Yes! Engineers were working as hard as they can, and PMs were right—it was slow.

As we worked with the team, this metric called "cycle time" was a litmus test on the efficiency of the planning, decompositions, and requirements analysis.

Team processes were optimized to make this metric better, and regular measurements provided feedback if changes made improvements.

Balance forces at play

Sometimes, managers try to address an ongoing problem with one big intervention. For example, quality is an issue: too many customer-facing bugs, deployments often aborted due to errors in the new code, production incidents keep engineers awake at night. Engineering leadership enforces a requirement for all services to have 80% test coverage and puts a strict timeline to reach the target.

I've rarely seen a one-big-intervention approach to yield results. For one, it typically goes on top of other commitments that teams have. In other words, it increases the workload of the teams.

Second, it might ignore other factors that significantly contribute to the issue. Let take the quality example, there are many reasons why quality might suffer. Engineers might not be familiar with testing best practices. Test suits do not cover the actual use cases. The team might be under pressure and rushing to deliver their commitments. Sometimes services assigned to the teams for illogical reasons, and the team might own a service that has little to do with their area.

Whenever you see that desired behavior is not happening, think of the forces that keep the current state of affairs. Tools like casual loop diagrams2 or "five whys" are great ways to understand factors at play. In my experience, a small series of changes is more likely to shift the balance towards the desired behavior.

What we do every day is more important than what we do once in a while.

It way more beneficial to brush teeth every day for two minutes, than to do it for one hour once a month.

Do not optimize for exceptions

Sometimes leaders come up with processes because they had a bad experience on a particular topic. The most common example when it comes to money. A small company used to give money to employees to buy stuff they need to work from home. One day, bosses discovered that one person misused their budget and bought expensive sound equipment instead. Leadership reacted disproportionately and became more suspicious. Now the employees first needed to buy stuff they needed, then submit reimbursement and attach a photo of their home office. What happens after is that new joiners were reluctant to ask for stuff to make their work more comfortable and overall satisfaction went down. I'm not judging that decision, yet I bet that misuse was rather an exception that the norm.

It's another story for big companies where exceptions happen often enough to be addressed. Furthermore, public companies do many things to stay compliant with regulations.

Empower autonomy

I strongly believe that growing autonomy of teams and people is key to scaling the organization. When designing processes, it's important to think if they give autonomy to the teams or take it away. I'm going to compare traffic lights and roundabouts to illustrate my point. Both regulate traffic on the crossroads. However, traffic lights are externally controlled. As a driver, you don't know when it will turn green or red. Moreover, when something happens with the traffic light, the crossroad becomes a mess.

It's a different story with roundabouts, by knowing the rules, drivers regulate themselves. There is no external party supervising the traffic, it requires no power supply, and it's safer3.

We often put traffic lights because they give a feeling of control, doesn’t necessary mean they solve the problem in the best way.

I worked in a company, where each team was responsible for building a feature for the next big release. Each team built code in their branch and when the time of release approached, they started to merge their features to the mainline. With a few teams, it worked. As the company grew, this process became more difficult. With the code base of a few millions lines of code with dozens of teams working on their features, merges were extremely difficult. After a while, there were release managers, who maintained a merge schedule and coordinated testing activities. Release managers literally were a traffic control team.

A “roundabout” alternative for this case is a trunk-based development4. It’s not easy to implement. It requires massive investment in automated testing, fast and reliable CI pipeline. In return, it allows scaling the number of teams working on the code base, without adding synchronization overhead.

Conclusion

Every process comes with trade-offs, it's important to make sure these trade-offs are aligned with the goals of the organization. Policies define rules the organization operates on, it's crucial for the organization's success to not let these rules become constraints in the future.

https://en.wikipedia.org/wiki/Learned_helplessness

https://thesystemsthinker.com/causal-loop-construction-the-basics/

https://www.weforum.org/agenda/2021/12/roundabouts-save-more-lives-than-traffic-lights/

https://trunkbaseddevelopment.com/

Incremental forgetting

Discussion about this post