Introduction
After the 2nd World War, Japan had lost its advantage in the industrialization game. They had a lot of catching up to do. One such industry which took a big hit was the automobile industry where the American market was way ahead of the Japanese market. So Japan started employing the idea of “Kaizen”. The meaning of the word kaizen in Japanese is continuous improvement where the idea is to continuously make smaller improvements to the process and standardizing those improvements such that the company progresses exponentially over time.
In 1943, Taiichi Ohno came to Toyota, and wanted to make sure that the company would become the best manufacturing company in the world. He devised several methodologies to achieve his goal. Even though his domain was mainly about how to optimize the inner working of a factory belt, some of these methodologies actually have found their way into software engineering.
In this article, we will talk about how some of these ideas can be beneficial for you to help improve the performance of your team whilst also improving the overall satisfaction of your team members.
Methods
There are quite a lot of different ideas and methodologies that came out of the Gemba Kaizen idea. We will not be giving you a comprehensive list of such ideas but rather a subset which we believe will have immediate effect on your day to day in getting started with the methodology.
Gemba Walk
“Gemba”, in Japanese, means; the place. The place where the work happens. For a programmer, Gemba could be the office in which all programmers sit next to each other, or even one's own home if you're working remotely. In a factory context, Gemba would be where the machines are. In Gemba Kaizen, the belief is that, it is important for anyone in the company to understand very well where the actual work is done and to be as close to it as possible. The work does not get done in the office of a manager, therefore a manager who sits in his room all day will not be able to come up with good ideas on how they can improve the company. In Gemba Kazien, the role of the manager is to make sure that the efficiency of the workplace is increased such that the work doers can do their job without any friction what so ever.
At IKEA, every new joiner, independent of their title or their discipline, will have to spend a day in the factory preparing the required products for shipment. This allows every IKEA employee to experience the business first hand. Not always will such experience help those employees on their day to day work however, if at least one such employee would notice a small thing which can be improved in the process, it can have a huge impact. This is an opportunity no company should miss.
The known example in the software industry is Blizzard called “CS for a day1”, where he would spend a day shadowing a customer support agent. During this process he would take notes, so that, once he goes back to his post the day after, he could immediately call others departments to ensure that such inefficiencies were addressed ASAP.
Have you ever watched one of your engineers struggling to understand which monitors to look at when an alert is triggered? Or how they struggle to find the revert button whenever a deployment goes wrong? These are all small inefficiencies that can be easily fixed if you spend a day with your team and observe how they work.
Red Button
Ohno was aware that there were small inefficiencies on the floor and these inefficiencies had cascading effects on each other. Imagine the following belt, where the first worker places the door into its socket, the second worker inserts the screws and the third one tightens them.
In this scenario, if the 2nd worker would not have enough screws, the whole pipeline would literally halt. Humans tend to have a tendency to work around such problems, in this example, they might decide to slow down the belt and deliver at a lower rate, or they might use a different type of screw which might endanger the lives of customers.
Ohno’s idea was to place a big red button at every station. If for whatever reason, an employee was not able to fulfill their task, they were to press the big red button. Pressing the big red button would literally stop the whole pipeline. Yes, you're read it correctly, the whole company. This would cause everyone to pay attention to the reason why the whole company had halted. Once the root cause was found, they would remedy the issue in such a way that it would not occur ever again.
Doing so, off course meant that there will be quite a few presses in the first couple of weeks, but eventually over time, the rate at which the button is pressed will diminish and the organization as a whole will become more and more optimized.
In software, our continuous integration pipelines act as our red button. When a build fails, the entire team is impacted, providing a moment to reflect on what went wrong and how to prevent similar issues in the future. Rather than bypassing the failing test or applying a temporary workaround, we must take the time to resolve the root cause.
A similar idea which was again employed by Michael Morhaime is the “Email Mike2” where employees could directly raise concerns which he then would immediately take action on.
Root Cause Analysis
Quite often, in manufacturing, one of the best improvements one can do is to prevent defects. In order to do so, one first needs to understand the root cause of such defects.
Let's take an example. Imagine that in your company, we observe that there are in average 15 bugs being reported every day. Each such bug gets triaged in a couple of days and then needs to be mitigated which tends to take another couple of days. This is quite a dangerous position to be in since we're dealing with a lot of waste (know as Muda in the Gemba Kaizen world) and therefore we're not spending our time on actually adding value to the company but rather taking away precious time.
Seeing this, companies will tend to come up with ideas to help improve their quality. They might consider improving the bug triaging process to help bring the two days triaging time down to a reasonable place. They might hire Quality Assurance engineers to help reproduce these bugs and offload some work from the engineers. All of these improvements, as useful as they might seem are targeted towards fixing the symptom and not the root cause.
Gemba Kaizen recommends that once such a bug gets reported (a defect has been found) it is important to quickly mitigate the issue so that the company can continue however the most logical next step is to do a Root Cause Analysis. One such methodology is called the Five Whys. We basically need to ask the same question again and again until we really understand where the bug is coming from.
At a first glance, it might appear that the user interaction gets disabled when the user double clicks the pay button. Once the user falls into this state they are unable to get themselves out of it, which breaks their most important flow, which is to purchase the product. Without doing a proper root cause analysis, you might go ahead and add a timer to the page which re-enables the user interaction every minute. This would indeed mitigate the issue however what the actual reason is is still unknown. So we ask ourselves:
Why does the UX get disabled?
Because we don't want to double charge the user
Why do we double charge the user?
Because the backend allows us to send multiple POST requests one after another
Why does the backend allow that?
Because it wasn't implemented using a PUT method
Once, we follow this exercise until we reach the root cause, we know that instead of hacking our way into the UI where we add a timer to enable the user interaction, we can go ahead and properly implement our backend so that we don't even need to lock the UX in the first place.
Even though, the methodology is called the Five Whys, in the example above, we reached our conclusion in less than five steps. There might also be situations where you reach the root cause in more than five steps. All is well as long as you find the root cause.
Once the root cause is found, not only should we fix the problem but we should also put the necessary processes in place so that we eliminate the chances of the problem reoccurring. In this case, we might add a pre-commit hook which asks developers who have implemented a POST endpoint, to ask whether or not they have thought of using PUT instead. Doing so will help diminish the chances.
Kanban
Another methodology which comes out of the Gemba Kaizen school is "Kanban". Unlike all the other methodologies which we talked about, Kanban is quite well known in the software industry. It is commonly adopted by teams which do not work on a product or have a fixed delivery cycle. A Site Reliability Engineering (SRE) might opt to use the Kanban methodology since they tend to handle issues as they occur, instead of pre-planning weekly sprints where they deliver new features for a product.
Conclusion
There are a lot of interesting techniques which fall under the umbrella of Gemba Kaizen. One of the best places to start to learn more about it is the book called 'Gemba Kaizen: A Commonsense Approach to a Continuous Improvement Strategy' by Masaaki Imai where he explains all known methodologies and how they can be applied to help you increase your productivity.