What is deployment frequency?

Deployment frequency is the count of successful production deployments per service over a defined window — typically per day or per week. It is the most basic of the DORA four key metrics, capturing how often a team ships a change that reaches users. Elite performers deploy on-demand multiple times per day; low performers deploy less than once per month.

Deployment frequency is the rate at which an engineering organization ships changes into production. Among the four DORA metrics, it is the simplest to define and the one most teams reach for first: count the successful deploys per service over a period. According to DORA's State of DevOps research, elite-performing organizations deploy on-demand — often many times per day per service — while low-performing organizations deploy less than once a month.

The metric matters less as a number to maximize and more as a signal of organizational capacity. Teams that can deploy frequently have, by necessity, solved a set of correlated problems: small batch sizes, automated testing, fast feedback on changes, low coordination overhead between teams, and confidence that a deploy will not cause an outage. A team that could deploy frequently but chooses not to is rare; a team that cannot deploy frequently is usually constrained by one of those underlying problems, and deployment frequency is how the constraint surfaces.

What counts as a "deployment"?

The definition that matters in practice is narrower than it sounds: a successful push of new code or configuration to a production environment for a single service. Each of those words hides a decision.

Successful. A deploy that rolls back automatically because health checks failed is usually counted as a failed deploy (and contributes to change failure rate), not as a successful deployment. A deploy that completes but is immediately reverted by a follow-up deploy may or may not count depending on the definition.

Production. Teams routinely deploy to staging, canary, and pre-production environments dozens of times per day. None of those count toward deployment frequency. The "production" definition is service-specific: for a public API, it might be the customer-facing version; for an internal pipeline, it might be the version that runs scheduled jobs against real data. Getting this definition right per service is one of the quiet hard parts of DORA implementation.

A single service. A monorepo CI job that simultaneously rolls out changes to twelve services is one CI run but twelve deployments. Counting it as one understates the team's deployment cadence; counting it as twelve overstates it if the deploys were not independent decisions. The right answer is usually twelve, because each service has its own change failure rate, its own MTTR, and its own production behavior.

Gaming behaviors

Like every metric, deployment frequency distorts when it becomes a target. The classic gaming pattern is to split a single coherent change across many small PRs and many small deploys so the number goes up without any underlying improvement in delivery capability. The number is technically true and behaviorally meaningless.

A subtler distortion comes from CI pipelines that "deploy" to production by promoting a previously built artifact when no code has changed — for example, a daily scheduled deploy that re-applies the current production version to refresh expiring credentials. Counting those as deployments inflates the number without representing any change in the system.

The cleanest way to keep the metric honest is to read it alongside the others. Deployment frequency on its own is gameable. Deployment frequency together with change failure rate and reliability is much harder to game without one of the other numbers reflecting the damage.

Data-collection problems that stall measurement

The conceptual definition of deployment frequency is simple. The data plumbing usually is not.

The most common stall point is getting the deploy events out of the systems that produce them. CI/CD pipelines emit deploy data in different shapes — GitHub Actions, CircleCI, GitLab CI, Jenkins, Argo Rollouts, Spinnaker, custom Kubernetes operators — and joining them with deploy targets (cluster, environment, region) and commit metadata requires consistent tagging across the pipeline. GitHub's APIs are widely cited as a pain point specifically: rate limits make it impractical to back-fill historical deploy data, and the schemas for deployments, check runs, and workflow runs don't line up cleanly. It is common to find engineering teams running homebrew ETL pipelines into a data warehouse just to make this data queryable. Many of those projects stall before they produce a number anyone trusts.

The second stall point is mapping CI jobs to services. A pipeline named deploy-prod might deploy three services or thirty, depending on the monorepo. Without a consistent service taxonomy, the per-service deployment-frequency view is unreliable, and per-service is the view that matters.

The third stall point is defining what success means. A deploy job that exits 0 but leaves a service unhealthy is a successful deploy by CI's definition and a failed one by any reasonable engineering definition. Teams that count CI exit codes report deployment frequencies that overstate the real number; teams that count post-deploy health-check success report a more honest number but have to build the health-check correlation themselves.

How Firetiger computes deployment frequency

Firetiger reads each PR diff, generates a deployment-specific monitoring plan, watches the deployment across staging, canary, and production, detects regressions, and investigates root cause. The same instrumentation that powers change-aware production monitoring also gives Firetiger the inputs for deployment frequency.

Specifically, deploy events arrive via GitHub deployment webhooks rather than polling the GitHub API, which avoids the rate-limit problems that stall warehouse-based implementations. Service identity comes from traces tagged with service.name rather than customer-maintained YAML, so monorepo CI jobs that touch many services correctly produce many per-service deployment-frequency entries. The "successful" definition is anchored in the Change Monitor verdict — a deploy that the monitor judges to have caused a regression is treated as a failed deploy for DORA purposes — which keeps deployment frequency consistent with change failure rate without a separate definition.

Where to start

Pick a single service to measure first. Per-service deployment frequency for one important service teaches more than aggregate deployment frequency across the org.
Write down what "production" means for that service. One sentence is enough. Do this before instrumenting anything.
Use webhooks, not API polling. GitHub deployment webhooks deliver deploy events in real time and avoid the rate-limit problems that consume implementation effort.
Read deployment frequency together with change failure rate. On its own, the number is easy to game. Together with stability metrics, it is hard to distort without one of the other numbers reflecting the cost.