Your system is fine. Your users aren't

Why you need business SLOs

Feb 25, 2026

white and black ceramic mug on brown wooden table — Photo by Ahalya Suresh on Unsplash

Why technical SLOs are not enough

Your infrastructure is humming along perfectly. Every metric is green. Response times are excellent. The database is stable. And yet your business is losing money or users. This is the fundamental problem with technical SLOs: they measure whether your system is working, not whether your system delivers what users need.

Consider Uber during rush hour. A rider opens the app. Your backend responds in 150ms. Your database is stable. Your error rate is zero. But the map shows no available drivers close enough to request a ride.

Everything your dashboards celebrate is fine. The only thing that matters to the rider is not.

What a business SLO looks like

A better approach is to define an SLO around the actual outcome riders care about:

99.5% of rider requests have at least 3 available cars within 2 km

This SLO captures what the user actually needs: available transportation options. It doesn’t describe the system. It describes the outcome.

The “3 available cars” and “2 km” radius aren’t arbitrary technical choices, they’re product decisions. The business has decided that riders need at least 3 options within 2 km to feel confident they can get a ride.

Where business and engineering meet

A common concern is: “If I start tracking business SLOs, do I stop caring about technical SLOs?”

No. You need both.

Technical SLOs are guardrails. They ensure your infrastructure can support your business outcome. If your API latency is 2 seconds, you can’t offer a good ride request experience. If your database is down, you can’t compute your business SLI reliably.

But technical SLOs should be set to support business goals, not the other way around. A technical SLO like “99.99% availability” is a means to an end. A business SLO like “99.5% of rider requests have 3+ cars within 2 km” is the end itself.

Start with the business outcome. Then work backward to decide what technical reliability you need to make that outcome consistently true.

Turning a fuzzy goal into a SLI

The SLO we defined is a business goal, but it’s not immediately measurable. We need to translate it into a concrete metric: a Service Level Indicator (SLI).

A good starting SLI could be: Percentage of rider sessions where at least N available cars are present within R km at the time of request.

But there’s not one single right answer. Different dimensions affect how you measure success:

N (cars): 1, 3, 5
R (radius): 1 km, 2 km, dynamic by city
Time window: At request time, or within 30 seconds
Scope: Per city, per geo-cell, per time of day

These dimensions represent trade-offs that should be made by your business, not your engineering team. Do you measure at the exact moment of request or allow 30 seconds? Does the target vary by city density? Should you track this globally or by geography?

Once the business has answered these questions, engineering has a clear target to optimize toward.

Where the data really comes from

Here’s where business SLOs force a shift in how you think about monitoring.

Technical SLOs are built on infrastructure metrics: request latency from your load balancer, error rates from your API gateway, uptime from your health checks. These are easy to measure because they’re generated by your systems automatically.

Business SLOs require domain events. In our fictive Uber case, the SLI requires understanding the state of available drivers at the moment a rider requests a ride. This isn’t something your infrastructure metrics track. A healthy API and database tell you nothing about driver supply.

So where does this data come from? Upon every ride request, the Uber app queries the backend for available cars nearby. At that moment, the system knows: how many cars are available, what’s their distance, which drivers accepted which requests. This is the raw material for computing the SLI.

The key is instrumenting your domain, not just your infrastructure. You need to log or emit events at the business level: “Rider X requested a ride in zone Y at time Z, and N drivers were available within R km.”

This is harder than passively collecting infrastructure metrics because it requires intentional design. In practice, it often looks like:

Adding logging/events in the request flow (capture supply at the moment of intent)
Shipping those events into a pipeline you can aggregate (warehouse or time-series)
Computing the SLI over a time window (e.g., hourly, daily) and slicing by city/zone/time of day
Alerting when the SLO is not met so you can respond

But it’s essential. Without this data, you’re flying blind about whether you’re actually delivering value.

Why this matters

When you measure business SLOs, teams naturally align around a shared outcome. Product stops asking, “How fast is the response time?” and starts asking, “Can riders get a ride?” Engineering stops optimizing for abstract numbers and starts optimizing for a real user experience.

It also makes incidents more actionable. When a technical SLO is breached, the response can be ambiguous: scale, refactor, or wait? When a business SLO is breached (e.g. riders can’t request a ride) the problem is unmistakable and the response is concrete.

Finally, it forces honesty. It makes you measure what matters, not just what’s easy to measure.

Getting started

Pick one user journey that matters most to your business. Define what success looks like from the user’s perspective, not the system’s. Instrument that flow to capture the data you need. Start measuring.

You don’t need to boil the ocean. One well-chosen business SLO will teach you more than a dozen technical metrics ever could.

Incremental forgetting

Discussion about this post

Ready for more?