Background:Inter-rater reliability is critical in oncology to ensure consistent and reliable measurements across raters and methods, such as when evaluating biomarker levels in different laboratories or comparing tumor size assessments by radiation oncologists during therapy planning. This consistency is essential for informed decision-making in both clinical and research contexts, and the intraclass correlation coefficient (ICC) is a widely recommended statistic for assessing agreement. This work focuses on hypothesis testing of the ICC(2,1) with two raters.Methods:We evaluated the performance of a naive permutation test for testing the hypothesisH0:ICC=0and found that it fails to reliably control the type I error rate. To address this, we developed a robust permutation test based on a studentized statistic, which we prove to be asymptotically valid even when paired variables are uncorrelated but dependent.Results:Simulation studies demonstrate that the proposed test consistently maintains type I error control, even with small sample sizes, outperforming the naive approach across various data-generating scenarios.Conclusions:The proposed studentized permutation test for ICC(2,1) offers a statistically valid and robust method for assessing inter-rater reliability and demonstrates practical utility when applied to two real-world oncology datasets.
背景:在肿瘤学中,评估者间信度对于确保不同评估者及方法间测量结果的一致性与可靠性至关重要,例如在不同实验室评估生物标志物水平,或在治疗规划中比较放射肿瘤科医生对肿瘤大小的评估。这种一致性在临床及研究背景下对于做出知情决策至关重要,而组内相关系数(ICC)是评估一致性的广泛推荐统计量。本研究聚焦于针对两名评估者的ICC(2,1)进行假设检验。 方法:我们评估了一种朴素置换检验在检验假设H0: ICC=0时的表现,发现其未能可靠控制I类错误率。为解决此问题,我们开发了一种基于学生化统计量的稳健置换检验,并证明即使配对变量不相关但存在依赖关系时,该检验方法也具有渐近有效性。 结果:模拟研究表明,即使在样本量较小的情况下,所提出的检验方法也能持续控制I类错误率,在各种数据生成场景中均优于朴素方法。 结论:针对ICC(2,1)提出的学生化置换检验为评估评估者间信度提供了一种统计有效且稳健的方法,并在应用于两个真实肿瘤学数据集时展现出实际应用价值。
Robust Permutation Test of Intraclass Correlation Coefficient for Assessing Agreement