Head and neck squamous cell carcinoma (HNSCC) is a prevalent and aggressive cancer, and accurate staging using the AJCC system is essential for treatment planning. This study aims to enhance AJCC staging by integrating both clinical and imaging data using a multimodal deep learning pipeline. We propose a framework that employs a VGG16-based masked autoencoder (MAE) for self-supervised visual feature learning, enhanced by attention mechanisms (CBAM and BAM), and fuses image and clinical features using an attention-weighted fusion network. The models, benchmarked on the HNSCC and HN1 datasets, achieved approximately 80% accuracy (four classes) and ~66% accuracy (five classes), with notable AUC improvements, especially under BAM. The integration of clinical features significantly enhances stage-classification performance, setting a precedent for robust multimodal pipelines in radiomics-based oncology applications.
头颈部鳞状细胞癌(HNSCC)是一种常见且侵袭性强的癌症,采用AJCC系统进行准确分期对治疗规划至关重要。本研究旨在通过多模态深度学习流程整合临床与影像数据,以优化AJCC分期。我们提出一种框架,采用基于VGG16的掩码自编码器(MAE)进行自监督视觉特征学习,并通过注意力机制(CBAM与BAM)增强特征提取能力,再通过注意力加权融合网络整合影像与临床特征。在HNSCC和HN1数据集上的测试显示,模型在四分类任务中准确率约达80%,五分类任务中约达66%,且受试者工作特征曲线下面积显著提升(尤其在BAM机制下)。临床特征的融合显著提升了分期分类性能,为基于影像组学的肿瘤学应用建立了稳健的多模态流程范式。