This article describes rationales and limitations for making inferences based on data from randomized controlled trials (RCTs). We argue that obtaining a representative random sample from a patient population is impossible for a clinical trial because patients are accrued sequentially over time and thus comprise a convenience sample, subject only to protocol entry criteria. Consequently, the trial’s sample is unlikely to represent a definable patient population. We use causal diagrams to illustrate the difference between random allocation of interventions within a clinical trial sample and true simple or stratified random sampling, as executed in surveys. We argue that group-specific statistics, such as a median survival time estimate for a treatment arm in an RCT, have limited meaning as estimates of larger patient population parameters. In contrast, random allocation between interventions facilitates comparative causal inferences about between-treatment effects, such as hazard ratios or differences between probabilities of response. Comparative inferences also require the assumption of transportability from a clinical trial’s convenience sample to a targeted patient population. We focus on the consequences and limitations of randomization procedures in order to clarify the distinctions between pairs of complementary concepts of fundamental importance to data science and RCT interpretation. These include internal and external validity, generalizability and transportability, uncertainty and variability, representativeness and inclusiveness, blocking and stratification, relevance and robustness, forward and reverse causal inference, intention to treat and per protocol analyses, and potential outcomes and counterfactuals.
本文阐述了基于随机对照试验(RCT)数据进行推论的合理性与局限性。我们认为临床试验无法从患者群体中获得具有代表性的随机样本,因为患者是随时间推移逐步入组的,本质上属于便利样本,仅受研究方案入组标准约束。因此,试验样本不太可能代表某个可定义的患病人群。我们运用因果图示阐释了临床试验样本内干预措施的随机分配,与调查中采用的真实简单随机抽样或分层随机抽样之间的本质差异。我们认为,针对特定分组的统计量(如RCT中某治疗组的中位生存期估计值)作为更大患者群体参数的估计值意义有限。相比之下,干预措施间的随机分配有助于进行关于治疗间效应的比较性因果推断,例如风险比或应答概率差。此类比较性推断还需满足从临床试验便利样本到目标患者群体的可迁移性假设。通过聚焦随机化程序的后果与局限,本文旨在厘清数据科学与RCT解读中若干根本性互补概念对之间的区别,包括:内部效度与外部效度、普适性与可迁移性、不确定性与变异性、代表性与包容性、区组化与分层化、相关性与稳健性、正向与反向因果推断、意向性治疗分析与遵循方案分析、潜在结果与反事实推断。