Research, signal design, and decision systems

Our KPI is moving in the “right” direction, but the business outcome it’s supposed to predict (revenue/retention) isn’t improving. How do we debug a broken KPI?

Lucía Ferrer
Lucía Ferrer
11 min read·

Answer

If your KPI is improving but revenue or retention is not, the metric is either mis measured, mis timed, or no longer causal. Start by re stating what the KPI is supposed to predict, for whom, and over what time delay, then validate that your outcome metric is stable and consistently measured. From there, work systematically from definition and data integrity to segment mix, saturation, and finally incentive driven gaming.

You are feeling a very specific kind of pain: the dashboard is giving you good news, but the business is not. That is not a “metrics are hard” shrug moment. It is a signal that the KPI contract has broken, and your job is to find where the contract stopped matching reality.

A KPI that only goes up while the business stays flat is like a car dashboard where the fuel gauge is taped to “Full”. Comforting until you stall.

Restate the KPI contract (what it should predict and why)

Before you touch SQL or dashboards, write down the KPI contract in plain language. Many KPI failures are not data problems at all, they are agreement problems.

A useful contract has five parts.

First, is the KPI leading or lagging? A leading indicator should move before the outcome and help you steer, while a lagging indicator confirms what already happened. If you are treating a lagging indicator like a leading one, you will chase noise and declare victory too early. Several good explanations of this distinction emphasize that leading indicators are only valuable if they truly precede and predict the business result you care about, not just correlate in hindsight.

Second, what outcome is it supposed to predict, exactly? “Revenue” can mean bookings, recognized revenue, gross revenue, net revenue after refunds, or contribution margin. “Retention” can mean logo retention, paid retention, activity retention, or cohort based retention.

Third, what is the expected delay? Write an explicit expected lag like “a change in KPI should show up in 30 day retention within 6 weeks” or “should affect recognized revenue within 2 billing cycles.”

Fourth, what population does it apply to? New users, paid users, a specific plan tier, a channel, or a geography. A KPI can be valid for one segment and useless for another.

Fifth, what is the causal story? In one sentence: “If we increase X, it should increase Y because Z.” If you cannot tell that story without hand waving, you are not holding a KPI, you are holding a vibe.

Practical tip: Put the contract at the top of the dashboard. The best dashboards read like a decision memo, not a spreadsheet.

Confirm the outcome measure is stable and correctly measured

It is surprisingly common for the outcome metric to be the thing that broke. Before you assume your KPI is lying, verify the outcome is comparable over time.

For revenue, confirm revenue recognition rules, refunds and chargebacks, coupon handling, proration, currency conversion, and any changes in finance systems. A small change in how refunds post can flatten “net revenue” even while bookings improve.

For retention, confirm the retention definition and clock. Are you measuring activity retention by calendar week, or cohort day 7 and cohort day 30? Did your “active” definition change due to a product event rename or a bot filter?

Also check whether there were BI changes that affect only the outcome: new accounting categories, a backfill, a different join key, or a late arriving data stream. When dashboards look healthy but growth stalls, the culprit is often a mismatch in definitions across teams rather than a true business contradiction.

Common mistake: Teams sanity check the KPI and assume the outcome must be right because “finance owns it.” Do this instead: reconcile outcome totals against an independent source for a short window, like processor payouts, invoices, or a raw ledger export.

Check for timing and attribution mismatches

A KPI can be correct and still look “wrong” if it is being compared to the outcome on the wrong timeline.

Three timing problems show up constantly.

First, cohort versus calendar misalignment. If your KPI is computed on weekly active users but retention is cohort based, you may be comparing different populations. Align both to the same cohorts when you are testing the KPI relationship.

Second, attribution windows that do not match the buying cycle. Marketing teams often optimize a near term KPI like trial starts or product qualified leads, while revenue is booked later and credited through a different window. If you have a 7 day attribution window but a 45 day sales cycle, your KPI will “improve” without a visible revenue lift on the same chart.

Third, delayed instrumentation or ingestion. If revenue lands two days late and the KPI is real time, the chart comparison will look disconnected.

Practical tip: Re estimate the expected lag with a simple visual before you get fancy. Plot KPI changes and outcome changes over time and shift one series forward by 1 week, 2 weeks, 4 weeks. You are looking for the alignment that makes the relationship strongest, then you can set expectations with stakeholders about when results should show up.

Audit metric definition drift and dashboard logic

If the KPI used to predict outcomes and now does not, suspect definition drift. This is where small “harmless” changes compound.

Audit the metric definition like you would audit a contract. Diff the current calculation against the prior version that stakeholders trusted. Pay special attention to filters, joins, time zones, deduplication rules, and inclusion of internal traffic.

Then split the KPI into its numerator and denominator and validate each one separately over time. A KPI can go up because the denominator fell, not because performance improved.

A high leverage move here is a golden backtest: recompute historical KPI values using the current logic and compare them to the values that were reported at the time. If the past suddenly looks different, you do not have a trend, you have a measurement change.

Instrument and event level integrity checks (data collection)

If the definition is stable, the next suspect is data collection. Event level problems are often invisible in aggregated charts.

Look for breaks by platform, app version, and release date. If iOS event volume dropped after an SDK upgrade, your KPI might “improve” because the missing events are disproportionately from low intent users, or because certain actions stopped logging.

Check event to session ratios, unique users generating the event, and duplicate events. Inspect identity stitching between anonymous and logged in users. A change in login flow can cause one real person to appear as two, inflating conversion like magic.

Also consider consent gating and ad blockers. If tracking loss is uneven across segments, your KPI can drift away from reality without any code errors.

A practical way to operationalize this is a small canary set of data quality checks: freshness, completeness, uniqueness, and validity for the events that feed the KPI.

Pipeline and warehouse checks (data processing)

If events are coming in correctly, validate the processing path. Most KPI dashboards are not a direct query of raw events. They are the end of a long chain of jobs, incremental models, and joins.

Look for incremental logic bugs, missing partitions, duplicated loads, and changes in aggregation grain. A one day lookback window can quietly drop late arriving events. A join key change can silently turn a one to one join into a one to many join and inflate counts.

Do a raw to modeled reconciliation for a short period. Pick a day, count raw events, count modeled events, and ensure the difference is explained. Then sample a handful of users and trace their event journey through the pipeline to make sure they appear where you expect.

At this point you are likely choosing what kind of investigation to run first. Use this table to pick the fastest path to clarity.

Audit Event-Level Data: Use this when you suspect the tracking itself changed. Review Metric Definition: Use this when people disagree on what the KPI actually means. Analyze Lag & Attribution: Use this when the KPI should lead the outcome but the timing is unclear. Compare Current vs. Historical Logic: Use this when the KPI used to work and recently stopped.

Segment mix and composition effects

Now assume measurement is correct and timing is aligned. The next reason a KPI can improve without business lift is composition.

If your KPI is an average across segments, the average can rise simply because you have more of the segment that scores well on the KPI, even if outcomes are flat or down. Or the reverse: the KPI improves inside each segment, but your business outcome falls because the mix shifted toward lower value users.

Decompose the KPI and the outcome by the segments that plausibly changed: acquisition channel, geo, device, plan tier, lifecycle stage, product surface, and sales assisted versus self serve.

Then compute a mix adjusted KPI by holding segment weights constant to a baseline period. If the improvement disappears after mix adjustment, you did not improve the underlying behavior, you changed who showed up.

Practical tip: Do not stop at “channel.” In many businesses, “new user versus returning user” or “enterprise versus SMB” explains more KPI breakage than any marketing tag ever will.

Nonlinearity, threshold effects, and metric saturation

Sometimes the KPI is real and causal, but you are pushing it in a region where it no longer matters.

Plot the outcome versus KPI in bins, like deciles of KPI. You may find a threshold effect where moving from low to medium drives a lot of revenue, but moving from medium to high does almost nothing. That is saturation.

This is where teams get trapped: the KPI target keeps rising because it is easy to measure, even though the marginal business impact is near zero. Your KPI is not “wrong,” it is just no longer discriminating.

If you find saturation, recalibrate. You might keep the KPI as a guardrail or quality check, but you need a closer to value metric for optimization.

Causal validity: correlation vs cannibalization vs substitution

Option Best for What you gain What you risk Choose if
Audit Event-Level Data Deep dive into raw data, instrumentation issues Identify granular data collection errors, schema drift Very time-consuming, requires specialized data skills Other checks fail, or you suspect fundamental data collection problems
Review Metric Definition Initial investigation, new metrics, or recent changes Clarity on expected behavior, alignment across teams Missing data-level issues if definition is assumed correct The metric is new, recently changed, or stakeholders disagree on its meaning
Validate Data Sources & Ingestion Suspected data pipeline issues, recent system changes Confidence in data accuracy, identify upstream breaks Time-consuming if many sources, may not reveal definition errors There have been recent changes to data sources, ETLs, or data ingestion
Analyze Lag & Attribution Metrics with time-delayed effects (e.g., revenue, retention) Understand true impact, avoid premature conclusions Misinterpreting short-term fluctuations as long-term trends The metric is a lagging indicator or has a known time delay
Compare Current vs. Historical Logic Metrics that previously worked, but are now broken Pinpoint exact changes in calculation logic Overlooking external factors if only focusing on logic The metric was stable and recently became unreliable
Create a 'Golden' Backtest Validating new logic or confirming old logic's integrity High confidence in metric accuracy, clear deltas Resource-intensive to build and maintain You need to confirm a fix or validate a new metric calculation

The hardest category is when the KPI moves and measurement is correct, but the KPI is not causing net new value.

Three patterns to test.

Correlation: The KPI and revenue moved together historically because both were driven by a third factor, like seasonality or marketing budget. When that factor changes, the relationship breaks.

Cannibalization: You improved a KPI in one part of the funnel by stealing from another. For example, you drove more trial starts by discounting heavily, but you pulled forward purchases that would have happened later, or you shifted customers from high margin plans to low margin plans.

Substitution: You improved a KPI by routing users into a different path that looks good on the KPI but does not monetize, like deflecting support tickets into self serve flows that increase “help center engagement” while customer satisfaction falls.

The cleanest way to resolve this is with a holdout, a geo split, or a difference in differences analysis so you can estimate net lift. Even if you cannot run a perfect experiment, you can add guardrails that make cannibalization obvious: total revenue, contribution margin, refunds, longer horizon retention cohorts, and customer experience signals.

Gaming / Goodhart’s Law and incentive driven artifacts

Finally, if people are paid, praised, or promoted for moving the KPI, assume the KPI will be moved, sometimes in ways you did not intend.

This is Goodhart’s Law in practice: once a measure becomes a target, it stops being a good measure. It shows up as spammy prompts, forced clicks, low quality signups, aggressive discounting, or sales tactics that increase a mid funnel rate but reduce average deal size or renewal likelihood.

Look for artifacts that line up with incentives and rollout dates. Did the KPI jump right after a new quota, dashboard, or campaign? Did downstream quality metrics degrade at the same time?

Common mistake: Treating gaming as a moral failing. Do this instead: treat it as a design flaw. Add counter metrics and guardrails so doing the right thing is also the easiest way to hit the number.

Two practical tips to keep you out of this trap.

First, pair every primary KPI with at least one quality guardrail that is hard to fake, such as 60 day retention, refund rate, churn, or complaint rate.

Second, separate “learning metrics” from “performance targets.” You can track a noisy proxy for insight without putting it on a bonus plan.

What to do first, and what not to overcomplicate

Start with the contract and timing, then validate the outcome measurement, then run a definition and backtest audit. Those steps catch the majority of broken metrics fast, and they create a shared narrative that keeps the team from arguing in circles.

Do not overcomplicate the fix by inventing five new KPIs. Fix the one you have, add one guardrail, and only then decide whether you need a replacement metric that is closer to revenue or retention.

Sources


Last updated: 2026-05-13 | Calypso

Tags

how-to-debug-a-broken-metric