Background/Objectives: Neuroendocrine neoplasms (NENs) are uncommon neoplasms. Grading informs the prognosis and treatment decision of NENs and is determined by cell proliferation, which is measured by mitotic count and Ki-67 index. These measurements present challenges for pathologists as they suffer inter- and intra-observer variability and are cumbersome to quantify. To address these challenges, we developed a machine learning pipeline for identifying tumor areas, proliferating cells, and grading NENs.Methods: Our study includes 385 samples of gastroenteropancreatic NENs from across British Columbia with two stains (247 H&E and 138 Ki-67 images). Labels for these cases are at the patient-level, and there are 186 patients. We systematically investigated three settings for our study: H&E, H&E with Ki-67, and pathologist-reviewed and corrected cases.Results: Our H&E framework achieved a three-fold balanced accuracy of 77.5% in NEN grading. The H&E with Ki-67 framework yields a performance improvement to 83.0% on grading. We provide survival and multivariate analysis with a c-index of 0.65. Grade 1 NENs misclassified by the model were reviewed by a pathologist to assess reasons. Analysis of our AI-graded NENs for the subset of pathologist-assessed G1s demonstrated a significant (p-value = 0.007) survival difference amongst samples the algorithm assigned to a higher grade (n= 20; median survival 4.22 years) compared to concordant G1 samples (n= 60; median survival 10.13 years).Conclusions: Our model identifies NEN grades with high accuracy and identified some grade 1 tumors as prognostically unique, suggesting potential improvements to standard grading. Further studies are needed to determine if this discordant group is a different clinical entity.
背景/目的:神经内分泌肿瘤(NENs)是一种较为罕见的肿瘤。分级对NENs的预后判断和治疗决策至关重要,其依据是细胞增殖情况,通常通过有丝分裂计数和Ki-67指数来评估。然而,这些测量方法存在观察者间和观察者内的变异性,且量化过程繁琐,给病理学家带来了挑战。为解决这些问题,我们开发了一种机器学习流程,用于识别肿瘤区域、增殖细胞并对NENs进行分级。 方法:本研究纳入了来自不列颠哥伦比亚省的385例胃肠胰神经内分泌肿瘤样本,包含两种染色图像(247张H&E染色图像和138张Ki-67染色图像)。这些病例的标签为患者级别,共涉及186名患者。我们系统性地研究了三种实验设置:仅使用H&E图像、结合H&E与Ki-67图像,以及经病理学家审查和修正的病例。 结果:我们的H&E框架在NEN分级中实现了77.5%的三重平衡准确率。结合H&E与Ki-67的框架将分级性能提升至83.0%。我们提供了生存分析和多变量分析,其c指数为0.65。模型误判的1级NENs经病理学家审查以评估原因。对病理学家评估为G1的子集中AI分级NENs的分析显示,与算法分级一致的G1样本(n=60;中位生存期10.13年)相比,算法判定为更高级别的样本(n=20;中位生存期4.22年)存在显著的生存差异(p值=0.007)。 结论:我们的模型能够高精度地识别NENs分级,并将部分1级肿瘤识别为具有独特预后的亚型,这表明标准分级方法可能存在改进空间。需要进一步研究来确定这一不一致的群体是否代表不同的临床实体。
A Deep Learning Framework for Classification of Neuroendocrine Neoplasm Whole Slide Images