Confidence Without Certainty: Quantifying Prediabetes Risk in NHANES Data
In reality, uncertainty is not a flaw in the data—it is an inherent characteristic of every measurement, survey, and study. The challenge is not to eliminate uncertainty but to quantify it and incorporate it into decision making.

Why this analysis matters
Healthcare reporting often presents results as if they were fixed facts. A prevalence estimate is reported as a single number, a risk factor is assigned a single coefficient, and a conclusion is delivered as if uncertainty had already been resolved.
In reality, uncertainty is not a flaw in the data—it is an inherent characteristic of every measurement, survey, and study. The challenge is not to eliminate uncertainty but to quantify it and incorporate it into decision making.
To demonstrate this principle, we analyzed publicly available data from the National Health and Nutrition Examination Survey (NHANES), focusing on the prevalence of prediabetes among adults aged 40 years and older.
Traditional reporting versus probabilistic reporting
A conventional analysis produces a simple point estimate:
Prediabetes prevalence among adults aged 40+ is approximately X%.
While easy to communicate, this statement creates an illusion of precision. It suggests that X% is the answer, when in fact it is only the center of a range of plausible values supported by the data.
To explore this uncertainty, we applied two complementary approaches:
Bootstrap resampling
Bayesian posterior estimation
Both methods seek to answer the same question:
How much confidence should we place in the reported prevalence estimate?
What we learned
The bootstrap analysis generated thousands of alternative versions of the original sample by repeatedly resampling participants. Instead of producing a single estimate, this process produced a distribution of plausible prevalence values.
The Bayesian analysis reached a similar conclusion from a different perspective. Rather than asking whether the estimate is statistically significant, it quantified the probability of different prevalence levels being true given the observed data.
The most important insight was not the estimated prevalence itself. The important insight was the shape and width of the uncertainty surrounding that estimate.
For example, decision makers can move beyond statements such as:
Prediabetes prevalence is X%.
and instead ask:
What is the probability that prevalence exceeds 25%?
What is the probability that prevalence exceeds 30%?
How much uncertainty remains around the estimate?
Is the evidence strong enough to justify intervention?
These questions are much closer to the way real decisions are made.
The value of probabilistic thinking
The analysis demonstrates a broader principle applicable far beyond healthcare and epidemiology.
Most reports communicate conclusions.
Decision makers, however, must act under uncertainty.
By visualizing bootstrap distributions, posterior distributions, and credible intervals, uncertainty becomes visible and measurable. Instead of hiding variability behind a single number, we can quantify the range of plausible realities supported by the data.
This shift changes the conversation from:
"What is the estimate?"
to
"What outcomes are plausible, and how confident are we in each of them?"
The NHANES case study did not reveal a previously unknown disease mechanism or a new clinical risk factor. Its contribution is methodological rather than biological.
The analysis shows how modern statistical approaches can transform a conventional prevalence estimate into a decision-support framework that explicitly quantifies uncertainty.
The result is not less confidence.
It is confidence without certainty.
And in complex decision environments, that distinction matters.
Visualizing Uncertainty
The challenge of uncertainty communication is not statistical but cognitive.
Most professionals understand that estimates are uncertain. However, uncertainty is often communicated through confidence intervals, p-values, and technical statistical terminology that remain inaccessible to non-specialist decision makers.
To address this challenge, the NHANES analysis was extended with a series of visualizations designed to transform abstract statistical concepts into intuitive visual narratives.
Rather than presenting uncertainty as a mathematical correction to a point estimate, uncertainty itself becomes the primary object of observation.
The resulting visual framework allows decision makers to see not only the estimate, but also the range of plausible realities supported by the data.
A Cloud of Plausible Realities
The posterior cloud visualization replaces traditional density plots with a more intuitive representation.
Each point represents a plausible value supported by the observed evidence.
The density of points indicates where the data place the greatest belief.
Instead of asking where the estimate is located, the viewer begins to ask:
Which outcomes are plausible, and which are not?
This subtle shift reflects the essence of Bayesian thinking.
From Estimates to Decisions
Business and healthcare decisions rarely depend on exact parameter values.
Instead, decisions depend on whether certain thresholds are likely to be exceeded.
The probability gauge translates statistical evidence into a decision-oriented language.
For example:
What is the probability that prevalence exceeds 30%?
This question is often more useful than asking for the prevalence estimate itself.
The visualization therefore connects statistical analysis directly to decision making.
The Range of Credible Realities
Traditional reporting often treats confidence intervals as technical appendices.
The credible interval strip elevates uncertainty to the primary visual element.
The emphasis moves from the center of the estimate to the full range of plausible values supported by the evidence.
Thresholds that may trigger interventions become immediately visible.
Risk is Not a Single Curve
Predictive models are frequently communicated as deterministic relationships.
The uncertainty ribbon demonstrates that even model predictions contain uncertainty.
Rather than displaying a single risk curve, the visualization reveals a family of plausible curves generated through repeated resampling.
This representation better reflects the reality of evidence-based prediction.








