Background: Pancreatic cystic lesions (PCLs), including intraductal papillary mucinous neoplasms (IPMNs) and mucinous cystic neoplasms (MCNs), pose a diagnostic challenge due to their variable malignant potential. Current guidelines, such as Fukuoka and American Gastroenterological Association (AGA), have moderate predictive accuracy and may lead to overtreatment or missed malignancies. Artificial intelligence (AI), incorporating machine learning (ML) and deep learning (DL), offers the potential to improve risk stratification, diagnosis, and management of PCLs by integrating clinical, radiological, and molecular data. This is the first systematic review to evaluate the application, performance, and clinical utility of AI models in the diagnosis, classification, prognosis, and management of pancreatic cysts. Methods: A systematic review was conducted in accordance with PRISMA guidelines and registered on PROSPERO (CRD420251008593). Databases searched included PubMed, EMBASE, Scopus, and Cochrane Library up to March 2025. The inclusion criteria encompassed original studies employing AI, ML, or DL in human subjects with pancreatic cysts, evaluating diagnostic, classification, or prognostic outcomes. Data were extracted on the study design, imaging modality, model type, sample size, performance metrics (accuracy, sensitivity, specificity, and area under the curve (AUC)), and validation methods. Study quality and bias were assessed using the PROBAST and adherence to TRIPOD reporting guidelines. Results: From 847 records, 31 studies met the inclusion criteria. Most were retrospective observational (n = 27, 87%) and focused on preoperative diagnostic applications (n = 30, 97%), with only one addressing prognosis. Imaging modalities included Computed Tomography (CT) (48%), endoscopic ultrasound (EUS) (26%), and Magnetic Resonance Imaging (MRI) (9.7%). Neural networks, particularly convolutional neural networks (CNNs), were the most common AI models (n = 16), followed by logistic regression (n = 4) and support vector machines (n = 3). The median reported AUC across studies was 0.912, with 55% of models achieving AUC ≥ 0.80. The models outperformed clinicians or existing guidelines in 11 studies. IPMN stratification and subtype classification were common focuses, with CNN-based EUS models achieving accuracies of up to 99.6%. Only 10 studies (32%) performed external validation. The risk of bias was high in 93.5% of studies, and TRIPOD adherence averaged 48%. Conclusions: AI demonstrates strong potential in improving the diagnosis and risk stratification of pancreatic cysts, with several models outperforming current clinical guidelines and human readers. However, widespread clinical adoption is hindered by high risk of bias, lack of external validation, and limited interpretability of complex models. Future work should prioritise multicentre prospective studies, standardised model reporting, and development of interpretable, externally validated tools to support clinical integration.
背景:胰腺囊性病变(PCLs),包括导管内乳头状黏液性肿瘤(IPMNs)和黏液性囊性肿瘤(MCNs),因其恶性潜能不一而构成诊断挑战。现行指南(如福冈指南和美国胃肠病协会指南)预测准确性有限,可能导致过度治疗或漏诊恶性肿瘤。人工智能(AI)技术,结合机器学习(ML)与深度学习(DL),通过整合临床、影像学和分子数据,有望提升PCLs的风险分层、诊断及管理水平。本文是首个系统评价AI模型在胰腺囊肿诊断、分类、预后及管理中应用、性能及临床效用的研究。 方法:本系统评价遵循PRISMA指南,并在PROSPERO平台注册(CRD420251008593)。检索数据库包括PubMed、EMBASE、Scopus和Cochrane Library,截止至2025年3月。纳入标准涵盖在胰腺囊肿患者中应用AI、ML或DL技术评估诊断、分类或预后结局的原始研究。提取数据包括研究设计、影像学模态、模型类型、样本量、性能指标(准确度、敏感性、特异性及曲线下面积AUC)及验证方法。使用PROBAST工具评估研究质量与偏倚风险,并依据TRIPOD报告规范进行合规性评价。 结果:从847篇文献中筛选出31项符合纳入标准的研究。多数为回顾性观察性研究(n=27,87%),聚焦于术前诊断应用(n=30,97%),仅一项研究涉及预后评估。影像学模态包括计算机断层扫描(CT)(48%)、超声内镜(EUS)(26%)和磁共振成像(MRI)(9.7%)。神经网络(尤其是卷积神经网络CNN)是最常见的AI模型(n=16),其次为逻辑回归(n=4)和支持向量机(n=3)。研究报道的AUC中位数为0.912,55%的模型AUC≥0.80。在11项研究中,模型性能优于临床医师或现有指南。IPMN风险分层与亚型分类是常见研究重点,基于CNN的EUS模型准确率最高达99.6%。仅10项研究(32%)进行了外部验证。93.5%的研究存在高偏倚风险,TRIPOD规范平均符合率仅为48%。 结论:AI在改善胰腺囊肿诊断和风险分层方面展现出强大潜力,部分模型性能优于现行临床指南和人工判读。然而,高偏倚风险、缺乏外部验证以及复杂模型可解释性有限,阻碍了其临床广泛应用。未来研究应优先开展多中心前瞻性研究,推动模型报告标准化,并开发具有可解释性、经外部验证的工具以支持临床整合。
Application of Artificial Intelligence in Pancreatic Cyst Management: A Systematic Review