Background:Human papillomavirus (HPV) plays a crucial role in the pathogenesis of oropharyngeal squamous cell carcinomas (OPSCC). Accurate HPV status classification is essential for therapeutic stratification. While p16 immunohistochemistry (IHC) is the clinical surrogate marker, it has limited specificity.Methods:In this study, we implemented a weakly supervised deep learning approach using the Clustering-constrained Attention Multiple-Instance Learning (CLAM) framework to directly predict HPV status from hematoxylin and eosin (H&E)-stained whole-slide images (WSIs) of OPSCC. A total of 123 WSIs from two cohorts (The Cancer Genome Atlas (TCGA) cohort and OPSCC cohort from the University of Naples Federico II (OPSCC-UNINA)) were used.Results:Attention heatmaps revealed that the model predominantly focused on tumor-rich regions. Errors were primarily observed in slides with conflicting p16/in situ hybridization (ISH) status or suboptimal quality. Morphological analysis of high-attention patches confirmed that cellular features extracted from correctly classified slides align with HPV status, with a Random Forest classifier achieving 83% accuracy at the cell level.Conclusions:This work supports the feasibility of deep learning-based HPV prediction from routine H&E slides, with potential clinical implications for streamlined, cost-effective diagnostics.
背景:人乳头瘤病毒(HPV)在口咽鳞状细胞癌(OPSCC)的发病机制中起关键作用。准确的HPV状态分类对治疗分层至关重要。虽然p16免疫组化(IHC)是临床替代标志物,但其特异性有限。 方法:本研究采用弱监督深度学习方法,基于聚类约束注意力多示例学习(CLAM)框架,直接从OPSCC的苏木精-伊红(H&E)染色全切片图像(WSI)中预测HPV状态。共使用来自两个队列(癌症基因组图谱(TCGA)队列和那不勒斯费德里科二世大学OPSCC队列(OPSCC-UNINA))的123张WSI。 结果:注意力热图显示模型主要聚焦于肿瘤富集区域。错误主要出现在p16/原位杂交(ISH)状态不一致或质量欠佳的切片中。对高注意力区域的形态学分析证实,从正确分类切片中提取的细胞特征与HPV状态相符,随机森林分类器在细胞水平达到83%的准确率。 结论:本研究证实了基于深度学习的常规H&E切片HPV预测具有可行性,为简化、经济高效的临床诊断提供了潜在应用前景。