Dialectical Inference: When Frequentists and Bayesians Finally Talk
Statistics has two major philosophical frameworks, and they've been arguing for nearly a century. Frequentists ask: "If there's no real effect, how surprising is this data?" Bayesians ask: "Given what we knew before and what the data show, how confident should we be?"
These are different questions, and they sometimes give different answers. Most researchers pick one framework and ignore the other. Some journals now encourage "dual reporting" — presenting both a p-value and a Bayes factor. But dual reporting still leaves the synthesis to the reader: what do you do when they disagree?
Dialectical Inference automates that synthesis.
The Two Frameworks, Briefly
Frequentist (the p-value)
The p-value answers: "If the null hypothesis were true (no real effect), what's the probability of seeing data this extreme?" A small p-value (traditionally < .05) is taken as evidence against the null. But the p-value never tells you how much evidence you have for the alternative — a p-value of .049 and a p-value of .00001 both say "reject the null," even though the strength of evidence is vastly different.
Bayesian (the Bayes factor)
The Bayes factor (BF₁₀) answers: "How much more likely is the data under the alternative hypothesis than under the null?" A BF of 10 means the data are 10 times more likely if there's a real effect than if there isn't. Unlike the p-value, it quantifies evidence on a continuous scale and can support the null (BF < 1/3 is evidence for no effect).
The Bayes factor requires a prior — an assumption about the expected size of the effect. This is its greatest strength (you can incorporate prior knowledge) and its most common criticism (different priors give different answers).
The Problem with Dual Reporting
Dual reporting is better than picking one framework. But in practice, researchers present both numbers and then write one paragraph of interpretation, usually deferring to whichever result supports their hypothesis. The synthesis is subjective and informal.
Worse, when the two frameworks disagree (which happens more often than you'd think, especially near conventional thresholds), the reader is left to figure out why and what to conclude. That's the gap Dialectical Inference fills.
How Dialectical Inference Works
Plain language
Dialectical Inference runs your data through both frameworks simultaneously, then classifies the outcome:
- Full agreement: Both the p-value and Bayes factor point the same way. If both reject the null (p < .05 and BF > 3), your finding is rock-solid. If both support the null (p ≥ .05 and BF < 1/3), you have genuine evidence of no effect.
- Partial agreement: One framework is decisive, the other is inconclusive. For example, p = .03 but BF = 1.8 (neither strong evidence for nor against). This means the data lean toward an effect, but the evidence isn't overwhelming.
- Disagreement: The two reach opposite conclusions. This is rare but important — it usually means the result is sensitive to assumptions about effect size or the prior distribution.
For each outcome, the tool explains why the frameworks agree or disagree, and identifies the safest claim you can make.
Technical details
The frequentist arm runs a classical test (independent t-test, paired t-test, one-sample t-test, or correlation test). The Bayesian arm computes a JZS Bayes factor using the BIC approximation (Wagenmakers, 2007) with a default Cauchy prior (scale r = 0.707).
The synthesis step classifies the outcome using these decision boundaries:
- Reject: p < α AND BF₁₀ > 3
- Retain null: p ≥ α AND BF₁₀ < 1/3
- Inconclusive zone: 1/3 ≤ BF₁₀ ≤ 3 (evidence is ambiguous)
Sensitivity analysis: The Bayes factor is recomputed across multiple Cauchy prior scales (r = 0.354, 0.5, 0.707, 1.0, 1.414). If the BF crosses the evidence threshold at some scale, that boundary is flagged — meaning the conclusion depends on your prior assumptions about effect size.
The frequentist and Bayesian components are entirely standard. The novel contribution is the synthesis step — algorithmically classifying agreement, disagreement, and boundary conditions rather than leaving it to the reader. The JZS Bayes factor uses a well-known BIC approximation, not the exact integral. This is related to existing dual-reporting recommendations but goes further by computing the synthesis.
Why This Matters
Consider a real scenario: you compare a treatment group to a control on an anxiety measure. You get p = .041 and BF = 1.4.
A frequentist-only analysis says: "Significant, p < .05." A Bayesian-only analysis says: "Inconclusive evidence." A dual report presents both numbers with no resolution.
Dialectical Inference says: "Partial agreement — the frequentist test is marginal while the Bayesian evidence is ambiguous. The data lean toward an effect but are insufficient for a confident claim. Sensitivity analysis shows the BF crosses 3 only with a narrow prior (r = 0.354), suggesting the result depends on expecting a large effect."
That's a dramatically more honest and useful conclusion than either framework provides alone.
When to Use It
- Your p-value is near the .05 threshold and you want a sanity check
- You want to report evidence for the null (impossible with p-values alone)
- A reviewer asks for Bayesian supplementary analyses
- You want to understand how robust your finding is across statistical philosophies
- You're teaching and want to illustrate when and why the two frameworks disagree
Try It
Dialectical Inference is available in the Novel Methods module. Enter two groups of data (or a single group for one-sample tests) and get the full synthesis in seconds.