This paper investigates the adaptability of four state-of-the-art artificial intelligence (AI) models to the Australian mammographic context through transfer learning, explores the impact of image enhancement on model performance and analyses the relationship between AI outputs and histopathological features for clinical relevance and accuracy assessment. A total of 1712 screening mammograms (n= 856 cancer cases andn= 856 matched normal cases) were used in this study. The 856 cases with cancer lesions were annotated by two expert radiologists and the level of concordance between their annotations was used to establish two sets: a ‘high-concordances subset’ with 99% agreement of cancer location and an ‘entire dataset’ with all cases included. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of Globally aware Multiple Instance Classifier (GMIC), Global-Local Activation Maps (GLAM), I&H and End2End AI models, both in the pretrained and transfer learning modes, with and without applying the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm. The four AI models with and without transfer learning in the high-concordance subset outperformed those in the entire dataset. Applying the CLAHE algorithm to mammograms improved the performance of the AI models. In the high-concordance subset with the transfer learning and CLAHE algorithm applied, the AUC of the GMIC model was highest (0.912), followed by the GLAM model (0.909), I&H (0.893) and End2End (0.875). There were significant differences (p< 0.05) in the performances of the four AI models between the high-concordance subset and the entire dataset. The AI models demonstrated significant differences in malignancy probability concerning different tumour size categories in mammograms. The performance of AI models was affected by several factors such as concordance classification, image enhancement and transfer learning. Mammograms with a strong concordance with radiologists’ annotations, applying image enhancement and transfer learning could enhance the accuracy of AI models.
本研究通过迁移学习探讨四种前沿人工智能模型在澳大利亚乳腺X线摄影背景下的适应性,分析图像增强对模型性能的影响,并评估AI输出与组织病理学特征之间的关联以验证临床相关性与准确性。研究共纳入1712例筛查乳腺X线影像(含856例癌症病例及856例匹配正常病例)。由两名放射学专家对856例癌变病灶进行标注,根据标注一致性水平建立两个数据集:标注位置一致性达99%的“高一致性子集”和包含全部病例的“完整数据集”。采用受试者工作特征曲线下面积评估全局感知多实例分类器、全局-局部激活映射、I&H及端到端四种AI模型在预训练与迁移学习模式下,应用/未应用对比度受限自适应直方图均衡化算法的性能表现。在高一致性子集中,无论是否采用迁移学习,四种AI模型性能均优于完整数据集。应用CLAHE算法可提升所有AI模型性能。在采用迁移学习与CLAHE算法的高一致性子集中,GMIC模型的AUC值最高(0.912),其次为GLAM模型(0.909)、I&H模型(0.893)和End2End模型(0.875)。四种AI模型在高一致性子集与完整数据集间的性能差异具有统计学意义(p<0.05)。AI模型对不同肿瘤大小分组的恶性概率预测存在显著差异。模型性能受一致性分类、图像增强及迁移学习等多因素影响。与放射科医师标注高度一致的乳腺X线影像,结合图像增强与迁移学习技术,可显著提升AI模型的诊断准确性。