Multiplicity in High-Dimensional Data

From Thousands of Findings to a Handful of Evidence

Jul 10, 2025

1 Introduction

This series explores how statistical evidence changes when thousands of biological variables are analyzed simultaneously. All examples are based on TCGA-BRCA RNA-seq data and focus on the practical consequences of multiplicity, uncertainty, and modern inference methods.

Source: TCGA-BRCA (The Cancer Genome Atlas – Breast Cancer)

Data type: RNA-seq gene expression

Analysis subset: Clinical subgroup comparison (Early vs Late disease stage)

Dimensions: 441 patients and 60,660 measured transcript features

Statistical challenge: Tens of thousands of simultaneous hypothesis tests performed on the same dataset.

Multiplicity Changes the Answer

Demonstrate how multiple testing correction transforms the number of apparent discoveries.

Differential expression testing was performed across all measured transcripts. Results were evaluated using: Raw p-values, Benjamini-Hochberg FDR control and Bonferroni correction

Results:

8,538 raw significant findings,
FDR discoveries: 35,
Bonferroni discoveries: 5.

The same dataset and the same scientific question can produce dramatically different conclusions depending on how multiplicity is handled.

Evidence Is Not Binary

Move beyond significant/non-significant thinking.

Rather than treating all retained discoveries equally, findings were ranked according to evidence strength after multiplicity control.

Among the 35 FDR-retained discoveries, evidence strength varied considerably despite all passing the same formal threshold.

A p-value threshold creates a decision boundary, not a natural separation between “true” and “false” findings.

Statistical evidence exists on a continuum. Some findings are supported by substantially stronger evidence than others.

60,660 Tests. One Model.

Introduce empirical Bayes thinking.

Independent gene-wise testing was compared with limma moderated statistics, which borrow information across all genes when estimating variability.

Independent testing discoveries: 35,
Empirical Bayes discoveries: 610.

The dataset did not change. The statistical model changed. More stable variance estimation revealed substantially more reproducible signals.

Statistical Significance and Effect Size Are Different Concepts

Distinguish statistical significance from practical relevance.

Genes were ranked independently by:

Adjusted p-value,
absolute log-fold change.
The overlap between the top-ranked genes was evaluated.

The strongest statistical evidence is not necessarily associated with the largest biological effect.

Not Every Discovery Is Equally Stable

Demonstrate that discovery lists contain signals with different levels of robustness.

Empirical Bayes methods increase the number of retained discoveries by stabilizing variance estimates across genes.

The transition from 35 independent discoveries to 610 moderated discoveries illustrates how uncertainty estimation influences which signals are retained.

Evidence should be evaluated not only by statistical significance, but also by how robustly a signal survives increasingly demanding inference procedures.

From Discovery to Evidence

Show how evidence emerges through successive refinement.

The complete analysis pathway was reconstructed from all tested transcripts through increasingly stringent evidence filters.

The final evidence set represents a tiny fraction of the original search space. Statistical analysis is fundamentally a filtering process that separates robust conclusions from apparent findings.