Convergent Core Analysis: Extracting What Your Data Truly Says
You've collected your data. You've cleaned it. Now it's time to analyze. But which analysis do you run? Do you winsorize outliers or exclude them? Mean-impute missing values or use listwise deletion? Run a parametric test or its non-parametric equivalent?
Each choice is defensible. Each gives you a slightly different result. And therein lies one of the most under-discussed problems in social science: the garden of forking paths.
The Problem: One Dataset, 200 Results
In 2016, Steegen and colleagues formalized what many researchers had long suspected: for any given dataset and research question, there are dozens or hundreds of defensible analytical choices, and those choices meaningfully change the results. They called this multiverse analysis — the practice of running every reasonable specification and showing the full distribution of outcomes.
Multiverse analysis was a breakthrough for transparency. Instead of reporting one cherry-picked result, you show all of them. But it left researchers with a new problem: you now have 200 results, and no principled way to summarize them.
You can plot a specification curve and eyeball it. You can report the percentage of specifications that are significant. But neither approach extracts the actual finding — the core claim that is genuinely robust across analytical choices.
The Solution: Convergent Core Analysis
Convergent Core Analysis (CCA) is designed to solve exactly this problem. Given a multiverse of specification results, CCA algorithmically extracts the convergent core — the strongest claim that survives across virtually all specifications.
CCA doesn't just count how many analyses agree. It finds the specific, defensible claim that nearly all analyses support, and identifies the exact conditions under which it might break down.
How it works (plain language)
Imagine you ask 200 experts the same research question. Each analyzes the data their own way. CCA listens to all 200, then reports:
- Direction consensus. Do the vast majority (say, 95%+) agree on the direction of the effect — positive or negative?
- Magnitude convergence. What's the typical size of the effect, and how tightly do the estimates cluster?
- Boundary conditions. Which analytical choices actually matter? If you change how you handle outliers, does the finding flip? If you switch from listwise to pairwise deletion, does the effect size double?
- Robustness cascade. Does the core claim hold at 90% agreement? 95%? 99%? This tells you how strict you can be before the consensus breaks down.
How it works (technical details)
CCA operates on specification-curve output — a matrix where each row is one specification, and columns include the effect size, p-value, and a vector of analytical decisions (e.g., outlier method, missing data strategy, covariate set).
The algorithm computes:
- Direction consistency: the fraction of specifications where
sign(effect)matches the modal sign, thresholded at a user-setcore_threshold(default: 0.95). - Magnitude convergence: the IQR of the effect-size distribution, plus the 5%-trimmed mean. A tight IQR means specifications agree not just on direction but on size.
- Boundary analysis: for each decision dimension (e.g., "outlier handling"), CCA computes the median effect across each level of that decision. If any level flips the sign, it's flagged as a boundary condition.
- Robustness cascade: the core threshold is swept from 90% to 99%, reporting at which level the claim still holds.
CCA is a genuine extension of multiverse analysis (Steegen et al., 2016). The individual components — direction counting, IQR, boundary detection — are standard techniques. The novelty is in combining them into a structured "convergent core" extraction, giving researchers a principled summary rather than a wall of specification curves.
When Should You Use CCA?
CCA is most useful when:
- You've already run or plan to run a multiverse / specification-curve analysis.
- You need a defensible summary for a paper or presentation — not just "67% of specs were significant" but a structured claim with boundary conditions.
- Reviewers have asked you to demonstrate robustness across analytical choices.
- You want to move beyond binary "significant vs. not" and articulate what your data actually supports.
Example
Suppose you're studying whether a mindfulness intervention reduces anxiety. Your multiverse includes 180 specifications varying by: outlier handling (3 methods), missing data strategy (3 methods), anxiety measure (2 scales), covariate set (5 choices), and analysis type (2: parametric/non-parametric).
CCA might report:
- Core finding: The intervention reduces anxiety (negative effect), with 96.7% direction consistency.
- Magnitude: Trimmed mean d = −0.41, IQR [−0.52, −0.31].
- Boundary: Using the STAI-T (trait measure) rather than STAI-S (state measure) shifts the median effect from −0.45 to −0.22. No other decision flips the sign.
- Cascade: Core holds at 99% threshold for direction, but magnitude convergence loosens at 95%.
That's a much more useful finding than "p = .03 with our preferred specification."
Try It in PsyStat Nexus
CCA is available in the Novel Methods module of PsyStat Nexus. Paste your specification-curve effect sizes and the tool computes the full convergent core with direction consensus, magnitude clustering, boundary conditions, and robustness cascade.