From Trial Eligibility to Real-World Evidence

Exploring External Validity in a Synthetic Healthcare Population

1 Introduction

Randomized clinical trials remain the foundation of evidence generation in medicine. Their strength lies in internal validity: carefully selected patient populations, controlled treatment exposure, and rigorous follow-up allow investigators to estimate causal treatment effects with high confidence. The price of this control is selectivity. Patients enrolled in clinical trials often represent only a fraction of those encountered in routine clinical practice. Older individuals, patients with multiple comorbidities, heavy healthcare utilization, or complex medication histories are frequently excluded. This analysis explores whether findings observed in trial-like populations remain consistent in broader real-world populations using a synthetic healthcare population generated with Synthea.

Applying realistic eligibility criteria to a synthetic population of 20,000 patients progressively reduced the cohort to 1,924 trial-eligible individuals.

3 TRIAL ELIGIBILITY DRAMATICALLY RESHAPES POPULATIONS

Trial populations are not merely smaller populations; they are systematically different populations created through selection.

5 PREDIABETES PREDICTS FUTURE DIABETES

Hypertension only: 12.5% Hypertension + Prediabetes: 15.7% This corresponds to an absolute difference of 3.2 percentage points and an approximately 26% relative increase in risk.

Baseline prediabetes identifies a subgroup at elevated future diabetes risk.

7 TESTING TRANSPORTABILITY DIRECTLY

Interaction p-value = 0.622 No evidence was found that the relationship differed between populations.

The association appeared transportable across populations despite substantial differences in baseline risk.

2 REAL-WORLD DATA BEGINS AS A CONNECTED ECOSYSTEM

Modern healthcare data rarely exist in a single table. Patients, diagnoses, medications and healthcare encounters must be integrated into a patient-level analytical dataset before meaningful scientific questions can be addressed.

Real-world evidence begins with data integration rather than statistical modeling.

4 TRIAL-LIKE PATIENTS ARE DIFFERENT PATIENTS

Compared with the broader population, trial-like patients had lower healthcare utilization, lower comorbidity burden and fewer exclusionary conditions.

Eligibility criteria change the clinical profile of the population under study.

6 RELATIVE EFFECTS CAN BE STABLE ACROSS POPULATIONS

Full cohort: OR = 1.32, p = 0.004 Trial-like cohort: OR = 1.69, p = 0.406 RWE-only cohort: OR = 1.23, p = 0.036 Although statistical significance was lost in the smaller trial-like cohort, the direction of association remained consistent.

Differences between trial and RWE populations may affect precision more than effect direction.

8 Conclusions

Real-world evidence does not replace randomized clinical trials. Instead, it provides a framework for understanding how far trial-derived findings can travel once they leave the controlled environment in which they were generated. In this synthetic healthcare population, trial eligibility substantially altered patient selection, but the prognostic importance of prediabetes remained remarkably consistent across trial-like and real-world settings.