Research, signal design, and decision systems

Before rolling out a new dashboard or KPI, how can we run a data pre mortem to predict how the metric will be misread?

Lucía Ferrer
Lucía Ferrer
13 min read·

Answer

Run a data pre mortem as a short, structured workshop where the team assumes the new KPI failed in the real world and then works backward to explain how it got misread. You pre define the metric contract, list likely misinterpretations, map confounders, predict gaming, and then redesign the dashboard and decision rules so the most common mistakes are harder to make. Done well, it turns “we shipped a number” into “we shipped a number with guardrails, context, and a plan for what to do when it moves.”

Most dashboards do not fail because the math is wrong. They fail because humans are busy, incentives are real, and context evaporates the moment the metric leaves the analyst’s screen. A data pre mortem is how you catch those predictable misreads before a KPI becomes a story people repeat in meetings.

What a “data pre mortem” is and when to use it

A data pre mortem is a risk rehearsal. You gather the people who will build, use, and be judged by a metric and you assume that, three months after launch, the KPI caused confusion or bad decisions. Then you ask a simple question: “What went wrong, and how did we let the metric mislead us?” This is different from data QA. QA asks whether the data is accurate and pipelines run. A pre mortem asks whether the metric will be interpreted correctly and used responsibly, even when the data is technically correct.

Use it when any of these are true.

  1. You are introducing a new KPI or redefining an old one.

  2. A new audience will consume the dashboard, especially executives or frontline managers who were not in the build.

  3. The metric will be linked to performance reviews, bonuses, targets, or budget decisions.

  4. You are making a major pipeline change such as a new event taxonomy, a source system migration, new identity stitching, or backfill logic.

If you only do one of these per quarter, do it for the KPI that could most easily become “analytics without context,” which is basically misinformation wearing a spreadsheet costume (see the context argument in [1]).

Pre work: assemble the metric contract and decision context

The pre mortem works when everyone starts from the same definitions. Your goal is not a long spec. It is a one page “metric contract” plus a clear decision context.

The metric contract should include the pieces that commonly get lost when a KPI is repeated secondhand. A KPI dictionary approach is a good model here [2].

Include these items in plain language.

  1. Metric name and purpose. What problem is it meant to detect or manage?

  2. Operational definition. The exact calculation, including numerator and denominator.

  3. Time window and grain. Daily, weekly, trailing 28 days, cohort based, and so on.

  4. Inclusion and exclusion rules. What counts, what does not, and why.

  5. Default segmentation. Region, channel, product line, customer type, and the required “slices” that prevent misleading rollups.

  6. Data lineage. Source systems, key transformations, and where the “source of truth” lives.

  7. Refresh cadence and data latency. What “today” means.

  8. Known constraints. Missing data, tracking gaps, small sample volatility, or any period where comparisons are unsafe.

  9. Intended decisions and owners. Who is expected to act when it changes, and what decisions it is supposed to inform.

Practical tip: add one sentence titled “Do not use for.” For example: “Do not use this KPI to evaluate individual rep performance,” or “Do not use day to day changes to decide pricing.” This single line prevents a lot of creative misuse.

Practical tip: include a minimum sample threshold, even if it is a simple rule like “no segmentation view below 200 events in the period.” It saves you from arguing with noise later.

90 minute workshop agenda (repeatable format)

Invite a small, mixed group. Eight to twelve people is ideal.

Roles to include.

  1. Metric owner. The business owner accountable for what happens when the KPI moves.

  2. Analytics and data. The people who know definitions, caveats, and pipeline behavior.

  3. Domain subject matter experts. The operators who know what is really happening in the field.

  4. Frontline users. The managers or teams who will actually use the dashboard under time pressure.

  5. Incentives partner if relevant. Finance, HR, sales ops, or whoever sets targets and compensation.

Artifacts to bring.

  1. The one page metric contract.

  2. A rough dashboard mock, even if it is a sketch.

  3. A short glossary of terms and segments.

Time boxed agenda.

  1. 0 to 10 minutes. Align on the decision the metric is supposed to improve and the top two risks.

  2. 10 to 25 minutes. Step 1 brainstorm: how this will be misread.

  3. 25 to 45 minutes. Step 2 confounders and alternative explanations.

  4. 45 to 60 minutes. Step 3 pre mortem Goodhart, predict gaming.

  5. 60 to 70 minutes. Step 4 edge cases and pipeline brittleness.

  6. 70 to 80 minutes. Step 5 build the metric bundle.

  7. 80 to 88 minutes. Step 6 dashboard design changes to make misreads harder.

  8. 88 to 90 minutes. Step 7 decision rules, escalation paths, and owners.

This “assume it failed, then explain why” structure borrows from classic pre mortem thinking ([3] and [4]), adapted to KPI rollouts.

Step 1: Brainstorm “how this will be misread” (failure modes library)

This step is quantity over quality. Ask everyone to write down misreads silently for three minutes, then share and cluster. Silence first matters because it reduces groupthink and status effects, a common issue in brainstorms [5].

Here is a failure modes library you can reuse, organized the way misreads show up in real organizations.

Context loss.

People forget the denominator, base rate, or volume. A percentage moves but the underlying count collapses.

People ignore seasonality or calendar effects. The metric “drops” every holiday week and everyone panics anyway.

People miss the time horizon mismatch. A weekly KPI is used to judge a quarterly initiative.

Definition and measurement ambiguity.

Two teams use slightly different definitions for the same word, like “active,” “qualified,” or “resolved.”

Rounding and formatting make small changes look meaningful.

Aggregation hides the story.

A company wide number looks stable while one region is on fire and another is booming.

Simpson’s paradox shows up. The overall rate improves while every segment worsens, or vice versa.

Bias and missingness.

Selection bias. The metric only reflects customers who were measured, not all customers.

Survivorship bias. You only see the users who stuck around.

Missing data is not random. Tracking fails more on older devices, certain geos, or certain workflows.

Timing and latency.

Late arriving data makes the last few days look worse than they are.

Operational lag means the KPI responds weeks after the real world change.

Qualitative gaps.

The number moves but nobody knows “what changed on the ground.” The KPI becomes a debate about narratives.

Common mistake moment: teams often treat this step as a debate about who is right. Do not litigate the metric yet. Capture misreads first, then rank which ones are most damaging and most likely. You are building a risk register, not trying to win a meeting.

Step 2: Map confounders and alternative explanations (signal vs. noise)

Option Best for What you gain What you risk Choose if
A / B Testing / Quasi-Experiments Establishing causality with high confidence Strong evidence for cause-and-effect relationships High setup cost. ethical considerations. not always feasible You can randomly assign users/groups and control variables
Expert Review / Brainstorming Leveraging collective knowledge and experience Identification of domain-specific confounders. diverse perspectives Groupthink. reliance on anecdotal evidence You have access to subject matter experts and need qualitative insights
Causal Sketching / DAG-lite Understanding direct and indirect relationships Visual clarity of assumptions. identification of potential confounders Oversimplification of complex systems. time investment You need to map out how variables influence each other before analysis
Control Views / Stratification Isolating the impact of a variable More precise measurement of specific effects Over-segmentation leading to small sample sizes You suspect a specific factor is influencing your primary metric
The 'What else changes?' Test Quickly identifying obvious confounders Fast, intuitive check for external factors Missing subtle or non-obvious confounders You're reviewing a metric or dashboard and need a rapid gut-check
Defining 'Counter-Metrics' Preventing unintended negative consequences Holistic view of impact. early warning for adverse effects Increased dashboard complexity. potential for analysis paralysis Your primary metric has a clear potential downside or tradeoff

Once you have misreads, you need a lightweight way to separate signal from noise. The core question is: “If this KPI moves, what else could be causing it besides the thing we care about?” This is where organizations most often overreact, because they mistake correlation plus urgency for causation.

Use a simple method.

First, do a causal sketch. Draw the KPI in the middle and add the top drivers around it: demand, supply constraints, product changes, pricing, marketing mix, customer composition, outages, policy changes, and competitor actions. Keep it “DAG lite” in spirit, not an academic exercise.

Second, run the “What else changes?” test. If the KPI rises, what other operational indicators typically move at the same time? If it falls, what else tends to fall?

Third, define control views. Decide which segments you will always check before telling a story. Typical control views are by channel, by cohort age, by region, and by customer type.

Fourth, define counter metrics. For any KPI you want to push up, ask what you might accidentally push down.

To make this concrete, here is a decision oriented menu of controls you can use.

A / B Testing / Quasi-Experiments helps when you truly need cause and effect.

Causal Sketching / DAG-lite forces assumptions into the open so people can disagree productively.

Control Views / Stratification prevents a single blended number from driving the wrong story.

The 'What else changes?' Test is the fastest way to catch obvious alternative explanations.

Practical tip: choose two default “explainers” that always sit next to the KPI. Example: if conversion rate moves, always show traffic mix and page latency. It turns a vague story into a bounded investigation.

Step 3: Pre mortem Goodhart: predict gaming and perverse incentives

If the KPI will be used for targets or performance evaluation, assume it will be optimized. Not because people are bad, but because they are human and busy, and incentives are loud.

Use three prompts.

First: “If compensation or praise is tied to this number, how could someone win without creating real value?”

Second: “What are the easiest levers to pull that move the metric quickly?” Quick levers are where gaming tends to appear.

Third: “What process loopholes does the metric create?” For example, reclassifying cases, delaying work until the reporting window resets, or routing easy customers to the measured channel.

Then add guardrails. You can combine several without making the dashboard unreadable.

  1. Pair the KPI with a quality counter metric. If you push for faster resolution, also watch reopen rate or customer satisfaction.

  2. Add audit checks or random sampling. If a category can be manipulated, sample records and verify.

  3. Use caps and floors. If you know a metric is volatile at low volumes, suppress or gray out the view.

  4. Add a lightweight qualitative review. A short “what changed operationally?” note from the owner reduces confirmation bias, which is a real dashboard problem [6].

Step 4: Stress test edge cases and data pipeline brittleness

This step is where you protect executives from the “we had an incident but the chart did not mention it” moment.

Walk through these edge cases.

Late arriving data and backfills. Decide whether the most recent days are labeled as preliminary.

Schema and tracking changes. If event names or fields change, what breaks and how will you know?

Outages and partial ingestion. What does the KPI look like during a logging outage, and does the dashboard warn users?

Reclassification. What happens when definitions change, like a new status taxonomy or a new fraud rule?

Small n volatility. If you slice too far, the metric becomes a coin flip with a nice font.

Versioning and change logs. When the definition changes, can a user tell which version they are viewing?

You do not need a deep technical runbook here. You do need a clear agreement on what the dashboard should do during weird weeks, because weird weeks happen.

Step 5: Build the “metric bundle” (primary + counter metrics + context)

A single KPI invites over interpretation. A metric bundle reduces that risk by making the most important context visible by default. This aligns with the broader point that missing context is often the most important part [7].

Use this template.

Primary KPI. The headline number.

Input or leading indicators. The controllable drivers you expect to move first.

Outcome or lagging indicators. The business result you actually care about.

Quality and safety counter metrics. What you do not want to break while improving the KPI.

Guardrail thresholds. Red lines that trigger investigation.

Default segments. The two or three cuts that prevent misleading averages.

Context notes. Seasonality, known incidents, definition changes.

Example. If your primary KPI is ecommerce conversion rate, the bundle might include traffic quality (new versus returning, channel mix), page performance, average order value, return rate, and customer support contact rate. If conversion rises while return rate spikes, you know you may be “selling” the wrong thing.

Common mistake moment: teams add ten companion charts “just in case” and call it context. What to do instead is pick a small bundle where each companion metric has a job, either explaining movement, protecting quality, or preventing gaming.

Step 6: Make misreads harder through dashboard design

Now you translate the risk register into design choices. Many misreads are predictable UI problems: missing denominators, unclear time windows, and charts that encourage cherry picking. Dashboard design guidance consistently warns about these patterns ([8] and [9]).

Design patterns that work in executive settings.

Prominent definitions. Put the one line definition next to the KPI, not hidden in a wiki.

Show denominators and volumes. If you show a rate, show the count.

Baseline comparisons. Always include a sensible comparison like prior period and same period last year when seasonality matters.

Seasonality overlays and annotations. Mark holidays, launches, outages, and policy changes on the chart.

Default segmentation. If a rollup can mislead, do not make it the default view.

Warning states. Gray out or label periods where data is incomplete or backfilled.

Tooltips with provenance. Where the data came from, when it refreshed, and what changed recently.

“Do not use for” note. A small line that prevents the KPI from being used as a blunt instrument.

Light humor, because it is true: a dashboard without context is like a smoke alarm that only beeps in Morse code, technically accurate, emotionally unhelpful.

Step 7: Define decision rules and escalation paths

The final step is where you stop the KPI from becoming a weekly argument. You define what action looks like.

Start with decision rules written as if then statements.

If the KPI moves more than X percent and volume is above the minimum threshold, then the metric owner opens an investigation within one business day.

If the KPI moves and one of the guardrail counter metrics crosses its threshold, then escalate to the cross functional owner group.

If data freshness warnings are active, then the KPI is not used for performance decisions until cleared.

Then assign ownership and timing.

Who acts first. Name a role, not a department.

Investigation SLA. How fast someone has to respond.

Escalation path. When to pull in data engineering, finance, ops, or product.

Pause conditions. When the dashboard should be labeled as unreliable, or the metric temporarily removed from performance conversations.

Documentation location. Put the metric contract, bundle definition, and decision rules where users will actually look, ideally linked from the dashboard itself.

A data pre mortem is not bureaucracy. It is how experienced teams avoid predictable failures: context loss, confounding, Goodhart gaming, and brittle pipelines that turn normal variation into leadership drama. Do the ninety minutes, ship the metric bundle, and make the “what do we do when it moves?” decision before the number starts making decisions for you.

Sources


Last updated: 2026-04-30 | Calypso

Sources

  1. building.theatlantic.com — building.theatlantic.com
  2. vladimirsiedykh.com — vladimirsiedykh.com
  3. whennotesfly.com — whennotesfly.com
  4. theuncertaintyproject.org — theuncertaintyproject.org
  5. problempop.io — problempop.io
  6. atticusli.com — atticusli.com
  7. answerhorizon.com — answerhorizon.com
  8. clicdata.com — clicdata.com
  9. bi-academy.org — bi-academy.org

Tags

signal-vs-noise-why-organizations-misread-data