What are DORA metrics?

DORA metrics are a small set of indicators — deployment frequency, lead time for changes, change failure rate, mean time to recovery, and a newer reliability metric — that Google's DevOps Research and Assessment program identified as predictive of software delivery performance. They work best as a diagnostic surface that points teams toward real bottlenecks, not as targets to optimize directly.

DORA stands for DevOps Research and Assessment, a research program now housed within Google Cloud. The team behind it surveys thousands of software engineers and engineering leaders each year through the State of DevOps Report. Across more than a decade of data, the same handful of metrics keep separating high-performing organizations from low-performing ones, regardless of company size, industry, or stack.

The original four metrics break down into two pairs. Deployment frequency and lead time for changes capture velocity: how often you ship and how long a change takes to reach production after a developer starts working on it. Change failure rate and mean time to recovery capture stability: how often deployments cause problems and how quickly you recover when they do. A more recent fifth metric, reliability, captures the user-experienced consequence of those four — roughly, the share of recent deploys that were unplanned fixes for user-facing bugs.

DORA's central insight is that velocity and stability are not in tension. Elite organizations ship more often and recover faster and break things less often than low performers. The metrics are most useful when read together: a team with high deployment frequency and a 0% change failure rate is probably under-counting failures; a team with very low CFR and very low deployment frequency is probably batching risk into large, infrequent releases.

The five metrics, briefly

Deployment frequency — how often the team ships a production change. Elite performers deploy on-demand (multiple times per day); low performers deploy less than once per month.
Lead time for changes — the time from a code change being authored to it running in production. Different teams measure from different start points (first commit, PR opened, PR merged), and the choice changes what the metric tells you.
Change failure rate — the percentage of deployments that cause a degradation requiring remediation. Elite performers sit between 0% and 15%; low performers can be ten times higher.
Mean time to recovery — the average time from the onset of a production incident to the restoration of service. Elite performers recover in under an hour.
Reliability (the fifth DORA metric) — added in more recent State of DevOps reports, this metric asks roughly what percentage of recent deployments were unplanned fixes for user-facing bugs. It ties the other four to user experience.

DORA as a diagnostic, not a destination

The most common way teams misuse DORA is to treat the metrics as targets. Goodhart's law applies forcefully here: when a metric becomes a target, it stops being a useful measure. Teams that are evaluated on deployment frequency learn to split a single change across many trivial PRs. Teams evaluated on change failure rate learn to avoid filing incidents. Teams evaluated on MTTR learn to close tickets fast.

A more durable framing — one we hear consistently from engineering leaders — is to use DORA as a North Star diagnostic surface that helps a team find where the real bottlenecks are. If lead time is dominated by build-and-test duration, the work is in CI infrastructure. If change failure rate is high during specific deploy windows, the work is in deploy verification. If MTTR is long because investigation takes hours, the work is in observability. The metrics themselves are not the goal; they point to the work.

Engineering leaders also commonly observe that DORA approximates what they actually care about, rather than being it. The deeper goal is usually framed as enabling a large, fast-moving team to ship safely without coordination overhead. DORA is the most widely accepted approximation of that goal, but it is still an approximation.

Why the way you measure DORA matters

There are many vendors offering DORA dashboards: LinearB, Jellyfish, Swarmia, Sleuth, DX, Datadog DORA, and others. Most of them compute the metrics in similar ways, and most of those ways depend on tickets, surveys, or pattern-matching against git history. That is the industry default — but it is not the only option, and it has well-known limitations.

Change failure rate is typically computed in one of three ways: pattern-matching git history for "revert"/"hotfix" commits, correlating deployments with incidents opened in PagerDuty within a time window, or counting tickets labeled "production incident" in Jira. Each of these depends on human discipline (filing the right tickets, applying the right labels) and each misses categories of real failures — most notably the silent regressions that never trigger a page.

Mean time to recovery is almost universally measured as the lifecycle of an incident ticket: open-to-close. This rewards teams who update Jira quickly and penalizes teams who fix things faster than they file paperwork.

Deployment frequency and lead time depend on instrumenting the deploy pipeline and joining deploy events with commit metadata. The work is straightforward but the data is widely reported as painful to collect — GitHub API rate limits, varied CI/CD systems, monorepo-to-service mapping, and definitions of "production" that differ per service all slow teams down. It is common to find motivated engineering teams running homebrew ETL pipelines into a warehouse just to get clean DORA inputs flowing.

A telemetry-grounded approach measures the same metrics from production behavior instead. Deployment frequency comes from deploy webhooks. Lead time comes from joining commit metadata to deploy events. Change failure rate comes from correlating deploys with anomalies observed in production telemetry on the services owning the changed code. MTTR comes from the time between a metric leaving its baseline and the same metric returning to baseline. This approach does not depend on ticket hygiene, label discipline, or extensive per-service configuration.

For a fuller treatment, see Why ticket-based DORA metrics fall short.

How Firetiger measures DORA

Firetiger reads each PR diff, generates a deployment-specific monitoring plan, watches the deployment across staging, canary, and production, detects regressions, and investigates root cause. The same data that drives change-aware production monitoring also gives Firetiger the inputs for DORA: deploy events from GitHub webhooks, commit-to-deploy correlation for lead time, the Change Monitor's verdict for change failure rate, and telemetry-derived recovery windows for MTTR. Service mapping comes from traces tagged with service.name and service.version rather than customer-supplied YAML, so teams don't stall on the configuration step that typically blocks DORA implementations for months.

The point is not "we have a DORA dashboard." The point is that with the right telemetry already in place, DORA falls out as a side-effect of measurement rather than as a separate build.

Where to start

Pick one metric and measure it honestly for 90 days. Trying to instrument all five at once is how DORA projects stall. Start with deployment frequency (easiest) or change failure rate (most informative).
Decide what "production" means per service. This is the definition that quietly breaks every DORA implementation. Write it down before you start measuring.
Distinguish the diagnostic from the target. Use the numbers to find bottlenecks; do not set them as goals tied to compensation, and do not publish them as leaderboards.
Prefer telemetry-grounded measurement where you can. Ticket-based measurement is the industry default, but it under-counts silent regressions and penalizes teams who fix things faster than they file paperwork.