Background/Objectives: Developing high-performance artificial intelligence (AI) models for rare diseases like malignant bone tumors is limited by scarce annotated data. This study evaluates same-modality cross-domain transfer learning by comparing an AI model pretrained on chest radiographs with a model trained from scratch for detecting malignant bone tumors on knee radiographs. Methods: Two YOLOv5-based detectors differed only in initialization (transfer vs. scratch). Both were trained/validated on institutional data and tested on an independent external set of 743 radiographs (268 malignant, 475 normal). The primary outcome was AUC; prespecified operating points were high-sensitivity (≥0.90), high-specificity (≥0.90), and Youden-optimal. Secondary analyses included PR/F1, calibration (Brier, slope), and decision curve analysis (DCA). Results: AUC was similar (YOLO-TL 0.954 [95% CI 0.937–0.970] vs. YOLO-SC 0.961 [0.948–0.973]; DeLongp= 0.53). At the high-sensitivity point (both sensitivity = 0.903), YOLO-TL achieved higher specificity (0.903 vs. 0.867; McNemarp= 0.037) and PPV (0.840 vs. 0.793; bootstrapp= 0.030), reducing ~17 false positives among 475 negatives. At the high-specificity point (~0.902–0.903 for both), YOLO-TL showed higher sensitivity (0.798 vs. 0.764;p= 0.0077). At the Youden-optimal point, sensitivity favored YOLO-TL (0.914 vs. 0.892;p= 0.041) with a non-significant specificity difference. Conclusions: Transfer learning may not improve overall AUC but can enhance practical performance at clinically crucial thresholds. By maintaining high detection rates while reducing false positives, the transfer learning model offers superior clinical utility. Same-modality cross-domain transfer learning is an efficient strategy for developing robust AI systems for rare diseases, supporting tools more readily acceptable in real-world screening workflows.
背景/目的:针对恶性骨肿瘤等罕见疾病开发高性能人工智能模型常受限于标注数据稀缺。本研究通过比较基于胸部X光片预训练的AI模型与从头训练的模型在膝关节X光片中检测恶性骨肿瘤的性能,评估同模态跨领域迁移学习的效果。方法:两个基于YOLOv5架构的检测器仅初始化方式不同(迁移学习vs.从头训练)。模型均使用机构内部数据进行训练/验证,并在包含743张X光片(268例恶性,475例正常)的独立外部测试集上评估。主要评价指标为受试者工作特征曲线下面积;预设工作点包括高灵敏度(≥0.90)、高特异度(≥0.90)及约登指数最优值。次要分析涵盖精确率-召回率/F1分数、校准度(Brier评分、斜率)及决策曲线分析。结果:两组模型AUC相似(迁移学习组0.954[95% CI 0.937–0.970] vs. 从头训练组0.961[0.948–0.973];DeLong检验p=0.53)。在高灵敏度工作点(两组灵敏度均为0.903)时,迁移学习组展现出更高特异度(0.903 vs. 0.867;McNemar检验p=0.037)和阳性预测值(0.840 vs. 0.793;Bootstrap检验p=0.030),在475例阴性样本中减少约17例假阳性。在高特异度工作点(两组均约0.902–0.903)时,迁移学习组灵敏度更优(0.798 vs. 0.764;p=0.0077)。在约登指数最优工作点,迁移学习组灵敏度显著占优(0.914 vs. 0.892;p=0.041),特异度差异无统计学意义。结论:迁移学习虽未提升整体AUC,但在临床关键阈值上能改善实际性能。该模型在保持高检出率的同时降低假阳性,展现出更优的临床实用性。同模态跨领域迁移学习是构建罕见病稳健AI系统的高效策略,有助于开发更易融入实际筛查工作流程的辅助工具。