Novel Method

Scale-Persistent Features: A Durability Test for Your Findings

By Moonlit Social Labs · April 3, 2026 · 7 min read

You've run a correlation and found r = 0.35 between self-reported stress and sleep quality in your sample of 80 participants. It's statistically significant (p = .001). You feel confident. But should you?

Statistical significance tells you the relationship probably isn't zero. It doesn't tell you whether the relationship is stable — whether it would look roughly the same if you'd collected a different sample, or if your dataset were half the size, or twice it.

Scale-Persistent Features (SPF) answers that question by stress-testing your correlations across scales and resamples.

Why Significance Isn't Enough

The replication crisis taught us that many "significant" findings don't replicate. One major reason is that the original effect was inflated by sampling variability. With a sample of 80, a true r of 0.15 can easily manifest as 0.35 in one lucky sample. The p-value will still be significant, because p-values answer the wrong question for this purpose — they tell you about the null, not about stability.

What you really want to know is: if I ran this study again, would I get a similar number? And, more practically: which of my correlations are sturdy enough to build theory on, and which are fragile?

How SPF Works

Plain language

Think of SPF as a durability test for your correlations. It takes each relationship and asks three questions:

Does it survive downsizing? SPF computes the correlation using only 25%, 50%, and 75% of your data (randomly selected, 50 times each). If the relationship barely budges across sample sizes, it's durable. If it fluctuates wildly or flips sign, it's fragile.
Does the direction hold? Across all those subsamples, what fraction of the time does the correlation point the same way as the full-sample result? If it's positive in 98% of subsamples, the direction is stable. If it's positive in only 60%, the direction itself is uncertain.
Does it survive resampling? SPF runs bootstrap resamples (drawing samples with replacement) and checks whether the 95% confidence interval includes zero. If it doesn't, the relationship is robust to sampling variability.

Each relationship gets a persistence score from 0 to 1, classified as:

Persistent (score ≥ 0.7): This relationship is durable. Build your theory on it.
Moderate (0.4–0.7): Probably real, but treat with caution. More data would help.
Fragile (score < 0.4): This relationship may be an artifact of your particular sample. Don't build on it.

Technical details

For each variable pair, SPF computes:

Subsample stability: Pearson r at 25%, 50%, 75%, and 100% subsample sizes (50 repetitions each), yielding median r and SD at each scale.
Direction stability: the fraction of subsamples where sign(r) matches the full-sample sign.
Bootstrap CI: n bootstrap resamples with replacement (default: 1000), yielding median r, SD, and 95% percentile confidence interval.
Zero-crossing: whether the bootstrap CI includes zero.

The persistence score is a weighted composite:

Persistence = 0.4 × direction_stability + 0.3 × subsample_consistency + 0.3 × precision

Where precision = 1 - (bootstrap_SD / |r|). Thresholded at 0.7 for "Persistent" and 0.4 for "Moderate."

Honest Assessment

SPF is not full topological data analysis or persistent homology, despite the name's inspiration from those fields. For typical social-science datasets (5–20 variables, 50–500 observations), actual TDA is overkill. SPF uses the core idea — "what persists across scales?" — implemented with standard bootstrap and subsampling, which are well-established statistics. The persistence score is a weighted heuristic, not a formally derived metric, but it performs well in practice as a quick triage tool.

Example

You have 5 variables measured on 120 participants: stress, sleep quality, exercise, social support, and GPA. You compute all 10 pairwise correlations. Seven are significant at p < .05. But which ones should you actually report and interpret?

SPF results:

Stress ↔ Sleep: r = −0.42, persistence = 0.89 (Persistent). Direction stability: 100%. Bootstrap CI: [−0.55, −0.28]. Build on this.
Exercise ↔ GPA: r = 0.21, persistence = 0.38 (Fragile). Direction stability: 74%. Bootstrap CI: [−0.02, 0.41]. This looks significant but is unstable — the CI nearly crosses zero, and the direction flips in 26% of subsamples.
Social Support ↔ Stress: r = −0.33, persistence = 0.71 (Persistent). Solid but tighter than Stress–Sleep.

Without SPF, you might have reported all seven correlations equally. With SPF, you know which three to headline and which to footnote with appropriate caveats.

When to Use SPF

You're doing exploratory analysis and want to know which relationships are real before building a model around them.
You have a moderate sample size (50–300) and worry about sampling variability inflating your effects.
You're writing a paper and want to preemptively address replicability concerns.
You're comparing correlations across subgroups and need to know which are stable within each group.

Try It

Scale-Persistent Features is in the Novel Methods module. Enter your variable columns (comma-separated values) and see persistence scores, stability plots, and bootstrap CIs for every pair.

Get started free →

Novel Method

The Complexity Navigator

When your model gets complicated, how do you know if the complexity is earning its keep? The Complexity Navigator finds the optimal tradeoff.

Novel Method

Dialectical Inference

What if frequentist and Bayesian analyses disagree? Dialectical Inference synthesizes both into a unified conclusion.