How to Run a Pre Mortem on Your Metrics Before They Run Your Team Off a Cliff

A meeting ready workflow for a metrics pre mortem for support teams: pressure test KPIs before they become decision driving, prevent metric gaming, add lightweight governance, and ship decision safe,

Lucía Ferrer
Lucía Ferrer
14 min read·

Spot the cliff edge: the three warning signs a metric is about to become ‘truth’

Metrics rarely become dangerous in a dramatic way. They just get more important every week.

First it’s a “quick update” slide. Then it’s a dashboard tile. Then it’s a goal. Then it’s the thing people get praised (or dragged) for. By the time it starts hurting, the metric feels like reality itself.

A metrics pre mortem for support teams is how you interrupt that progression. You assume the metric already caused a bad call, then work backward to uncover how it failed in real operations. It’s cheaper than the usual alternative: ship the metric, tie it to behavior, and spend the next quarter “debugging” what was actually a predictable design flaw.

The moment a metric moves from reporting to incentives

In support, a metric becomes decision driving when it shows up in exec reporting, OKRs, quarterly planning, staffing conversations, or performance management. At that point, you’re not “measuring.” You’re steering.

Concrete example: a team promotes First Response Time as the headline KPI. They hit target for six weeks, so leadership freezes hiring. Meanwhile the backlog quietly shifts from “new tickets waiting for first touch” to “half-resolved tickets waiting for real progress,” and median time to resolution climbs. The metric was accurate. The decision was wrong.

That’s the cliff edge: not bad data, but a number that looks like certainty.

The trust failure pattern: disagreement about the number, then about the decision

Support metrics break trust in a consistent sequence.

  1. People argue about whether the number is correct.

  2. Then they argue about what it means.

  3. Then they argue about the decision leadership is making from it.

End state: meeting purgatory, a dashboard nobody believes, and operators spending their energy “explaining the metric” instead of improving the work.

Support is especially vulnerable because complexity hides everywhere: channel differences, severity mix, reopen loops, automation, routing rules, policy changes that redefine “done,” and “helpful” work happening outside the system of record.

What a pre mortem is (in this context) and what it is not

A pre mortem for metrics is not a warehouse project, and it’s not a philosophical debate about the nature of truth. It’s a short pressure test you run before a metric becomes decision driving.

It borrows from the classic project pre mortem: imagine failure to create psychological safety for naming risks early. (These project versions are a solid reference point: [1] and [2])

Your goal is simple: predict how the metric will break once the org starts optimizing for it, then add the smallest guardrails that keep it decision-safe.

Run the 45-minute metrics pre-mortem meeting (agenda, prompts, and outputs)

Control Where it lives What to set What breaks if it’s wrong
Set: Metrics Owner Definition Card Single point of contact (Support Ops/Operator) No accountability for data quality or interpretation
Set: Counter-metric Definition Card A balancing metric to prevent gaming (e.g., CSAT for response time) Unintended negative consequences from optimizing one metric
Set: Segmentation Plan Definition Card Key dimensions for slicing data (e.g., customer segment, channel) Inability to diagnose issues or identify specific trends
Set: Review Cadence Meeting Agenda Timebox for each agenda step (e.g., 5 min for 'What could go wrong?') Meetings run over, key risks are missed, decisions are rushed
Set: Decision Rule: Executive/OKR Metrics Team Process No metric goes to exec/OKR without owner + counter-metric + segmentation plan High-level decisions based on flawed or incomplete metrics
Set: Stakeholder Alignment Pre-mortem Meeting Involve frontline lead, QA/enablement, and exec proxy Lack of buy-in, resistance to new metrics, missed operational impacts
Set: Metric Definition Definition Card Name, scope, inclusions/exclusions, calculation, freshness, caveats Misinterpretation, inconsistent reporting, distrust in data

These controls look fancy in a table. They’re not. They’re the minimum set of decisions that keep a metric from turning into a weapon or a hallucination.

A good pattern is to make this real through a 45-minute working session whenever a metric is about to be promoted. Not a quarterly “metrics council.” Not a recurring calendar hostage situation. A short meeting with outputs.

If you only enforce one rule, enforce this one:

No metric goes to exec reporting or OKRs without a metrics owner, a counter-metric, and a segmentation plan.

Otherwise you’re shipping a steering wheel without brakes, then acting surprised when the car gets creative.

Inputs to bring: draft definition, query/source notes, and intended decision

Bring three things, or don’t bother meeting.

1) Draft definition (plain language). What’s included, what’s excluded, and what exactly counts as the event.

2) Source notes (lightweight). Where the data comes from, how often it refreshes, and known blind spots.

3) Intended decision. What decision will change because of this metric.

This is where teams get burned: they bring a dashboard screenshot and call it a “metric review.” A dashboard is display. A definition is governance. If you don’t separate those, you’ll debate formatting while the meaning drifts.

The pre-mortem prompt: ‘It’s 90 days later and this metric caused a bad call—how?’

Open with the prompt:

“It’s 90 days later. This metric is on an exec slide. It caused a bad call. What happened?”

The prompt sounds gloomy. It’s actually practical. It gives people permission to say the quiet part out loud before incentives and reputations get tied to the number.

This same logic shows up in broader “tail risk” thinking: the expensive failures are often the ones you assumed couldn’t happen. [3]

In support, the answers usually cluster into a handful of buckets:

  • Coverage gaps (a queue/channel isn’t counted)
  • Definition drift (the meaning changed quietly)
  • Incentives (speed replaced outcomes)
  • Seasonality (spikes made the metric look like a performance issue)
  • Mix shifts (the average improved while a key segment worsened)

Practical facilitation tip: ask frontline leads first. They know the workarounds people use when pressure hits.

Required outputs: metric card, segments, counter-metrics, owners, review date

You’re aiming for five outputs, all of them small.

Metric definition card. Copy-ready, in plain language: name, scope, inclusions/exclusions, calculation approach (in words), freshness, caveats, and a change log with effective dates.

Segmentation plan. The slices you must always have available so the average can’t lie.

Counter-metric pairing. A balancing measure so optimization doesn’t create collateral damage.

Named owner. A person, not a team name. This is “Set: Metrics Owner” in the table for a reason: when the number breaks, someone needs both authority and obligation to fix the meaning.

Review date/cadence. Put it on the calendar. Put it in the doc. If you don’t schedule the re-check, you’re relying on hope. Hope is not governance.

A worked support example: you want First Response Time on exec reporting.

In the meeting, you decide whether it counts chat and email, whether bot acknowledgements count, whether “first response” means a human touch, and what business-hours rules apply.

Then you lock the decision statement: “We use First Response Time weekly to decide whether on-call coverage needs adjustment.”

Then you set segments so chat and email don’t average together, and sev 1 doesn’t hide under sev 3. Finally, you pair it with a counter-metric such as reopen rate, escalation rate, or time to resolution so speed doesn’t quietly replace outcomes.

If you want an external structure that stays simple, borrow the project pre-mortem format and adapt it to metrics. [1]

Decide what to trust vs what to measure: make the metric answer a single decision

Support teams don’t suffer from lack of data. They suffer from two flavors of pain:

  • Metrics that are easy to count but hard to trust
  • Metrics that are trustworthy but don’t map to a decision

The pre mortem forces a useful kind of discomfort: what decision is this metric supposed to make easier, and what does “good enough to decide” look like?

Start with the decision: staffing, prioritization, quality risk, or customer impact

Write the decision down in one or two sentences. Keep it blunt.

  • “We will use this metric to decide X weekly/monthly.”
  • “If it moves by Y, we will do Z.”
  • “If we can’t name an action, this metric is informational only.”

The trap is skipping to targets. A target without an explicit decision becomes a stick looking for a back.

Another common failure: one metric gets forced to justify four decisions. Staffing, quality, customer impact, churn risk—someone tries to stuff it all into a single tile. It can’t. Pick the primary decision. List secondary uses as “maybe,” not “promise.”

Signal quality test: coverage, freshness, and controllability

You don’t need a research project to decide whether a metric is decision-grade. You need a few blunt tests.

Coverage. What portion of total contacts/workload does it represent? Ask for two views: by volume and by severity. A metric that covers most volume but barely touches sev 1 can mislead staffing and escalation decisions.

Freshness. Does it update fast enough for the decision cadence? Daily refresh can be fine for monthly planning and terrible for daily staffing.

Definition stability. How often do policies, fields, routing rules, or automation change in ways that shift the meaning? “Stable” doesn’t mean “never changes.” It means changes are logged and interpretable.

Controllability. Can the team influence the metric without doing something customers hate? If the easiest way to hit it is to act like a robot reading from a script, you’re measuring the wrong behavior.

Mix-shift risk. Will channel/tier/product mix swing the metric without any real performance change?

This is also the practical reality of operating with imperfect information: waiting for certainty is still a decision, and it has a cost. [4]

Segmentation defaults: channel, tier, severity, and arrival vs completion views

Simplicity is great until it lies.

A single headline number is only safe when the segments move together most of the time. If they don’t, show segments by default. Don’t make operators “go digging” for the truth.

Two support-life examples where averages mislead:

Channel mix. Combined First Response Time improves because chat is staffed well and chat volume grows. Email gets worse, but the average looks great. Email customers don’t care that chat is thriving.

Severity mix. Time to resolution looks stable overall, but sev 1 cases are taking twice as long because the team is swamped with low-sev work. The average is calm while your risk is on fire.

Also watch arrival vs completion views. “Tickets closed” feels productive. But if arrivals climb faster than closures, your backlog ages even while charts look “busy.” It’s bailing water with enthusiasm while the boat keeps taking on more.

A practical default that prevents a lot of bad calls: include at least one backlog/aging view (even simple bands) alongside your speed KPI.

Failure modes pre-mortem: how your metric will get gamed, drift, or lie by omission

If you treat gaming as a moral failure, you’ll miss the point.

Most metric gaming in support isn’t malicious. It’s rational behavior under pressure. People optimize what is visible, rewarded, and discussed. If a metric becomes a target, you will get behavior shaped around the metric—even if everyone has good intentions.

This is why a pre mortem is basically a threat model for misinterpretation. The framing that many metric failures are design failures (not “bad people” failures) is worth internalizing. [5]

Coverage failures: missing channels, missing reopen loops, and excluded queues

Coverage is usually the first thing to break, because missing work is invisible until it’s expensive.

Missing queues. The metric excludes a workflow that’s inconvenient or stored elsewhere. Common pattern: escalations sit in a separate tool or process, so they’re “out of scope.” Congratulations: you can now hit targets while escalations explode.

Missing reopen loops. If your resolution metric stops counting after the first “solved,” agents can close quickly, customers reopen, and the metric stays pretty. Customers experience the loop. Your dashboard doesn’t.

Channel leakage. Phone calls, social, community posts, and “quick Slack help” often escape the system of record. The metric becomes an indicator of tool usage, not customer experience.

Minimum guardrail: declare the gaps on the definition card, estimate directional coverage (even if it’s rough), and set a review date to re-check. “Unknown” isn’t shameful. But it’s a No for exec/OKR use.

Definition rot: changing fields, policy changes, and silent calculation tweaks

Definition rot is when the metric keeps the same name but changes meaning.

It happens through workflow edits, automation, new SLA rules, routing changes, and “small” dashboard tweaks that nobody logs. You notice it later when someone says, “Wait, why did last month’s number change?” And trust evaporates.

The lesson is the same one you learn from silent operational failures: when change isn’t observable, it doesn’t stay small. [6]

Minimum guardrail:

  • Every definition change has an owner.
  • It’s logged with an effective date.

That’s it. You don’t need a committee. You need a paper trail.

Gaming and Goodhart effects: speed at the expense of resolution and trust

Three common support behaviors that show up once a speed metric becomes “the goal”:

Fast first touch, slow real help. Agents reply quickly with placeholders to hit First Response Time, then the customer waits for substance.

Premature solves. Tickets are marked solved to improve closure speed and backlog optics, then reopened later.

Routing around complexity. Hard cases get transferred or escalated excessively so individual productivity looks high.

The fix isn’t to scold people. The fix is to make the metric harder to misread.

  • Pair speed metrics with outcome metrics (time to resolution, reopen rate, escalation rate, CSAT trends).
  • Segment so the pain can’t hide.
  • Add a small sampling check so you see quality drift before customers do.

You’re not trying to predict every edge case. You’re trying to stop acting surprised by the likely ones.

Add the smallest guardrails that keep the number honest (without slowing the team)

Support teams hate governance when it feels like paperwork. They’ll tolerate it—and often appreciate it—when it feels like protection.

The goal isn’t precision theater. It’s decision-safe metrics that don’t whip the team around because a number got interpreted like a horoscope.

Guardrail menu: segmentation, counter-metrics, sampling, and audit checks

Four levers solve most metric problems:

Segmentation prevents averages from hiding a fire.

Counter-metrics prevent one-sided optimization.

Sampling validates quality without measuring everything.

Audit checks catch coverage gaps and definition drift.

A clean way to keep this lightweight: pick one guardrail that improves interpretation (usually segmentation) and one that shapes behavior (usually counter-metrics or sampling).

When to use a range/band instead of a point target

Point targets invite weird behavior near the threshold. You get frantic work to move from “bad” to “good” by one minute, even when the customer impact is basically identical.

Use a band when the metric is noisy, the work mix shifts often, or missing by a little doesn’t materially change outcomes. Weekly First Response Time bands often produce calmer, more honest behavior during spikes.

Use a point target when the outcome is truly binary or tied to a hard promise (like a sev 1 SLA).

If you’re unsure, start with a band until you understand normal volatility.

How to phrase the metric on dashboards and exec slides (to prevent misreads)

Most dashboard harm isn’t the number. It’s the caption.

If an exec screenshots the tile, the screenshot should still tell the truth. That means:

  • A short definition line (what’s included/excluded)
  • The counter-metric nearby (the “brake”)
  • The default segments available (so the average can be interrogated)
  • A label for how it should be used (“staffing decision,” “informational only,” etc.)

Two compact examples that work in real dashboards.

Speed package (First Response Time).

Headline: “First Response Time (median), last 7 days.”

Definition line: “Human first reply in chat + email. Bot acknowledgements excluded. Community + phone excluded.”

Guardrail line: “Shown with Reopen rate + Time to resolution.”

Segment line: “Default: channel + severity.”

Quality package (QA or CSAT).

Headline: “QA score (weekly sample).”

Definition line: “Based on 30-ticket sample across tiers. Use for coaching, not compensation.”

Guardrail line: “Shown with backlog aging to avoid ‘quality via slowness.’”

Segment line: “Default: new vs tenured, sev 1 vs others.”

Decision rule without overthinking:

  • Risk is mix shift → add segmentation.
  • Risk is incentive distortion → add a counter-metric.
  • Risk is quality drift → add sampling.
  • Risk is silent change → add an audit check + change log.

Notice what’s missing: “build a bigger dashboard.” Bigger dashboards mostly create bigger arguments.

If you want a reminder that dashboards are products with users, failure modes, and footguns, this internal dashboard perspective is useful. [7]

Ship it safely: the pre-mortem closeout checklist and your 30-day re-check

Metrics don’t become decision-grade because you held a meeting. They become decision-safe because you close the loop, publish the meaning, and verify the behavior change.

Closeout: document, publish, and set the next review date

Closeout should be short and non-negotiable.

  • The definition card is published where viewers can actually find it.
  • The metrics owner is named.
  • Default segments exist where they need to.
  • The counter-metric sits next to the headline.
  • The review date is scheduled.

Concrete anchor: put “Next review: [date]” on the dashboard tile. It signals the number is maintained, not abandoned. Abandoned metrics don’t die—they haunt.

30-day re-check: validate coverage, drift, and behavior change

At 30 days you’re not aiming for perfection. You’re aiming to catch cheap failures before they become expensive norms.

Two quick spot checks:

Coverage sanity check. Compare what the team remembers handling with what the metric claims happened. If frontline reality and dashboard reality disagree, dashboard reality loses.

Drift check. Look for changes that quietly altered meaning: routing, automation, SLA policy, field changes.

Then ask frontline leads three questions:

  1. “What behavior changed because this metric is visible?”

  2. “What do you think people are optimizing for now?”

  3. “What work feels pushed into the shadows?”

Those answers will teach you more about how to prevent metric gaming in support than a month of arguing over charts.

How to say ‘this metric is not decision-safe yet’ to stakeholders

This is where most teams cave—social pressure beats technical truth.

Keep a simple phrase bank you can repeat without sounding defensive:

  • “We can report this now, but it’s not ready for OKRs until we confirm coverage and add a counter-metric.”

  • “If this goes on the exec slide, it becomes a target. I want one review cycle to confirm it drives the behavior we want.”

  • “I can share the number with caveats today, or share it later as a decision metric. I don’t recommend skipping the caveat stage.”

A concrete Monday plan that stays lightweight: pick one metric that’s about to become “truth” and run the 45-minute pre mortem this week.

Lock three things: the decision statement, the definition card (with owner + change log), and the minimum guardrails (counter-metric + key segments).

Production bar: if an exec screenshots the dashboard tile, the screenshot must still tell the truth. If it can’t survive the screenshot, it’s not decision-safe yet.

Sources

  1. asana.com — asana.com
  2. atlassian.com — atlassian.com
  3. foreveryscale.com — foreveryscale.com
  4. abhs.in — abhs.in
  5. tightmargins.substack.com — tightmargins.substack.com
  6. monitrics.com — monitrics.com
  7. dev.to — dev.to