What is per-customer observability?

Per-customer observability scopes monitoring to individual customers rather than aggregating across all traffic. For B2B SaaS companies, aggregate metrics hide the experience of high-revenue customers — a 0.01% global failure rate can mean thousands of failed requests for your largest account. Per-customer observability surfaces those issues directly, but it depends on high-cardinality data infrastructure that traditional metrics backends struggle with.

Why it matters

The aggregate-metric problem is concrete: across B2B SaaS teams Firetiger has worked with, the typical service shows a global success rate above 99.9% — and within that, the largest customer (often >30% of revenue) is regularly experiencing 5-10% errors that the aggregate metric absorbs. The financial geometry of B2B SaaS makes this asymmetry painful: a handful of customers represent the bulk of revenue, each uses the product differently, and an issue affecting only one of them can cost more than an issue affecting everyone else combined. Per-customer observability is the discipline that surfaces this, but it requires high-cardinality data infrastructure that traditional metrics backends weren't designed for — see What is high-cardinality data? for the technical background.

Per-customer observability is the practice of monitoring and analyzing system behavior at the individual customer level. Instead of asking "what is our overall error rate?", it asks "what is the error rate for each of our customers, and which ones are experiencing degraded service right now?" This shift changes which issues teams discover at all.

Why do B2B SaaS companies need per-customer monitoring?

The fundamental issue with aggregate monitoring in a B2B context is that it treats all customers as interchangeable. They are not. A B2B SaaS company might have one customer responsible for 20% of its annual recurring revenue. That customer could be experiencing elevated error rates, increased latency, or intermittent failures -- and none of it would register in aggregate dashboards because the hundreds of other customers are running fine. The problematic traffic disappears into the statistical noise.

This is not a theoretical concern. One AI inference platform discovered that its enterprise customers were experiencing failures that were completely invisible in aggregate metrics. When your overall request volume is in the hundreds of millions, even thousands of failing requests per day from your largest customer rounds to 0.01% globally. The aggregate error rate stays green. The P99 latency stays within bounds. Meanwhile, the customer is filing support tickets and questioning whether to renew their contract.

The pattern repeats across the industry. One usage-based billing platform found that its diverse client base, where different customers use the same API endpoints but with completely heterogeneous data shapes, meant that backend performance profiles varied dramatically from customer to customer. Aggregate latency metrics were meaningless because what was normal for one customer could represent a severe degradation for another. Another developer tools company found that an LLM-based analysis of customer traffic revealed that their customers were being attacked far more frequently than aggregate metrics suggested -- the attack traffic was hiding in the noise of overall request volume.

The gap between what aggregate metrics show and what individual customers experience creates a reactive pattern that most B2B SaaS teams know well: they learn about customer issues from support ticket spikes, Slack escalations, or renewal conversations rather than from their monitoring systems. One collaboration tool company described their entire approach to customer issues as relying on Slack channels and internal escalations -- their monitoring stack told them plenty about system health in aggregate but almost nothing about individual customer experiences.

There is also a conceptual mismatch between the dominant monitoring philosophy and B2B needs. Google's Site Reliability Engineering model, which heavily influenced modern observability practices, was designed for consumer-scale services with global SLOs. One infrastructure company found that this consumer SRE model was insufficient for their B2B enterprise customers. When you have enterprise customers paying six or seven figures annually, "99.9% of all requests succeed" is a meaningless guarantee if the 0.1% that fail are concentrated on your most important accounts.

Per-customer observability reframes the problem. Instead of a single global SLO, each customer effectively gets their own SLO. Instead of alerting when aggregate error rates exceed a threshold, the system alerts when any individual customer's experience degrades beyond acceptable bounds. This is the difference between knowing that your platform is healthy and knowing that each of your customers is healthy -- a distinction that matters enormously for retention, expansion, and trust.

What technical challenges does per-customer observability create?

The reason most teams do not already monitor at the per-customer level is not a lack of desire. It is a set of concrete technical barriers rooted in how traditional observability tools store and query data.

The core challenge is high-cardinality data. In observability, cardinality refers to the number of unique values a dimension can take. HTTP status codes have low cardinality -- there are roughly five common values. Customer IDs have high cardinality -- a successful B2B SaaS company might have hundreds or thousands of active customers, each with a unique identifier. Adding customer_id as a tag or label on every metric immediately multiplies the number of unique time series the system must track.

Traditional metrics systems like Prometheus, Graphite, and the backends used by commercial observability platforms store data as pre-aggregated time series. Each unique combination of metric name and tag values creates a distinct time series. If you have a metric like http_request_duration with tags for service, endpoint, status_code, and customer_id, the total number of time series is the product of the unique values across all those dimensions. Adding a customer dimension with 1,000 unique values multiplies the series count by 1,000. Adding Kubernetes orchestration tags -- pod, node, container, namespace -- can silently multiply it again by thousands, because each pod/node/container combination generates its own set of series.

This combinatorial explosion is not just a storage problem. It is a cost problem. Most commercial observability vendors price by the number of custom metrics or unique time series ingested. Datadog's custom metrics pricing, for example, charges per unique time series, which means adding a customer_id tag can multiply costs by the number of active customers. One platform engineering team discovered that their metrics vendor imposed a 35-attribute limit on any single metric, which directly prevented them from adding customer-specific dimensions to their most important measurements. Another team was advised by their metrics vendor to avoid high-cardinality tagging entirely -- the vendor's recommended approach was to pre-aggregate metrics at the application layer, forcing engineers to understand metric types and aggregation semantics before emitting any data. This turns instrumentation from a simple act of recording what happened into a complex engineering decision about what questions you might want to ask later.

The cost implications are severe. Observability platforms that charge per custom metric or per time series make high-cardinality monitoring prohibitively expensive. One company found that Kubernetes orchestration tags alone were silently generating thousands of time series they did not intend to create. The resulting bill made the team afraid to add new tags -- the same action that once brought clarity now brought dread, because the last time someone added a dimension it caused a cardinality explosion that blew up their monthly costs.

There are vendor-provided workarounds. Features marketed as "Metrics Without Limits" or similar allow you to ingest high-cardinality data but only query pre-selected aggregations. These are partial solutions at best: you still pay for ingestion of the full cardinality data, and you lose the ability to explore dimensions you did not anticipate needing. They post-aggregate at query time rather than solving the underlying storage and cost model.

The net effect is that most B2B SaaS companies end up in one of two positions: they either pay enormous observability bills to maintain customer-level visibility, or they give up on per-customer monitoring and rely on support tickets and manual investigation to detect customer-specific issues. Neither outcome is acceptable for companies whose revenue depends on the experience of individual accounts.

How are modern platforms solving per-customer observability?

The limitations of traditional observability architectures for per-customer monitoring are well understood. A new generation of approaches is emerging that rethinks the data storage and query model from the ground up, treating high cardinality as a design requirement rather than an obstacle to manage around.

The foundational shift is from pre-aggregated time series to raw data stored in columnar formats on object storage. Instead of computing aggregations at write time and storing the results as individual time series, modern systems store raw telemetry data points -- including all their high-cardinality tags -- in columnar file formats like Apache Parquet, organized using table formats like Apache Iceberg on commodity object storage such as Amazon S3.

This architectural change has several important consequences. First, it decouples storage cost from cardinality. Object storage costs roughly $0.023 per gigabyte per month regardless of how many unique tag combinations the data contains. There is no per-series charge. Adding a customer_id dimension does not multiply costs the way it would in a traditional metrics backend. Second, columnar storage is inherently efficient for selective reads. A query that filters by a specific customer ID only needs to read the relevant column, not scan every data point. Third, the open table format means the data is accessible to any compatible query engine -- it is not locked inside a proprietary system.

The query model also changes fundamentally. Instead of pre-aggregating at write time and limiting what questions can be asked later, computation happens at query time. Analytical query engines designed for columnar data -- systems in the family of DuckDB, Trino, or purpose-built observability query layers -- can scan billions of rows efficiently, filtering and aggregating on the fly. This means you do not need to decide in advance which customer dimensions to track or which aggregations to compute. You store the raw data and ask whatever question you need when you need to ask it.

This is particularly powerful for per-customer observability because the questions are inherently exploratory. When you detect that a customer is having a bad experience, you need to drill down by endpoint, by region, by request type -- often across dimensions you did not anticipate being relevant. Write-time aggregation systems force you to have predicted the right dimensions in advance. Query-time computation systems let you explore freely after the fact.

The third piece of the modern approach is the use of AI agents to automate the monitoring and investigation process. Per-customer observability generates far more signals than a human team can watch. If you have 500 customers and each has their own set of health indicators, you cannot realistically expect a team of engineers to watch 500 dashboards. Instead, automated agents can continuously evaluate customer health, detect anomalies, and initiate investigations. Firetiger can scope monitoring to individual customers — tracking per-customer SLOs without requiring pre-aggregated metrics — and when a deploy lands, the same Change Monitor that watches the deployment can detect regressions specific to a single customer's traffic shape. An agent can run hundreds of exploratory queries in parallel -- comparing time windows, pivoting across dimensions, checking whether a pattern is real or coincidental -- work that would take a human investigator hours to do manually.

One real-world example illustrates the full stack in action. A web development platform with nearly 20,000 endpoints, a monolithic application, and only four SREs supporting 27 engineering teams faced a classic observability gap: they had inconsistent SLO adoption, no telemetry standardization, and their small team could not possibly monitor customer experience manually. The solution was not to hire more SREs or build more dashboards. It was to deploy an agent-based system that connects to both the codebase and telemetry data, proactively identifying and suggesting fixes for production issues on a per-customer basis.

Another example comes from a cloud browser platform that identified two critical per-customer metrics: request instance latency (how fast browsers are pulled from warm pools for each customer) and deployment creation errors (where 500 errors definitively indicate platform problems for a specific customer). By monitoring these at the individual customer level rather than in aggregate, the team could expose high-fidelity alerts directly to customers -- turning observability from an internal operational tool into a product differentiator that drives engagement and retention.

The combination of cheap columnar storage, query-time computation, and agent-driven analysis represents a fundamental shift in what is practical for B2B SaaS monitoring. Per-customer observability is no longer a luxury reserved for companies with the engineering resources to build custom solutions. It is becoming an accessible capability built on open data formats, commodity infrastructure, and intelligent automation -- the kind of capability that will increasingly separate companies that truly understand their customers' experience from those that are guessing.

See Firetiger in production

Read how Town keeps AI assistants running with Firetiger — Town's founding engineer on cutting through alert noise, shrinking debugging from days to minutes, and giving coding agents the context they need to do their best work. More teams use Firetiger this way at /case-studies.

Where to start

Add customer_id to your telemetry: Ensure logs, metrics, and traces include a customer or tenant identifier so you can filter and aggregate per customer.
Check if your tooling supports high cardinality: Verify that adding customer-level dimensions won't cause cost explosions or performance degradation in your current observability platform.
Define per-customer SLOs for your top 5 accounts: Start with your largest or most strategic customers and track their experience independently from global metrics.
Deploy per-customer monitoring agents: Use a system like Firetiger that scopes monitoring to individual customers without requiring pre-aggregated metrics, and watches deploys for regressions in any single customer's traffic.