Background/Objectives: Medical image segmentation is a crucial task for diagnosis, treatment planning, and monitoring of cancer; however, it remains one of the toughest nuts to crack for Artificial Intelligence (AI)-based clinical applications. Deep-learning models have shown near-perfect results for narrow tasks such as single-organ Computed Tomography (CT) segmentation. Still, they fail to deliver under practicality, in which cross-modality robustness and multi-organ delineation are essential (e.g., liver Dice dropping to 0.88 ± 0.15 in combined CT-MR scenarios). That fragility exposes two structural gaps: (i) rigid task-specific architectures, which are not flexible enough to adapt to various clinical instructions, and (ii) the assumption that a universal loss function is best in all cancer imaging applications. Methods: A novel multimodal segmentation framework is proposed that combines natural language prompts and high-fidelity imaging features through Feature-wise Linear Modulation (FiLM) and Conditional Batch Normalization, enabling a single model to adapt dynamically across modalities, organs, and pathologies. Unlike preceding systems, the proposed approach is prompt-driven, context-aware, and end-to-end trainable to ensure alignment between computational adaptability and clinical decision-making. Results: Extensive evaluation on the Brain Tumor Dataset (cancer-relevant neuroimaging) and the CHAOS multi-organ challenge demonstrates two key insights: (1) while Dice loss remains optimal for single-organ tasks, (2) Jaccard (IoU) loss outperforms when multi-organ, cross-modality divides cancer segmentation boundaries. Empirical evidence has thus been offered that optimality of a loss function is task- and context-dependent and not universal. Conclusions: The design framework’s principles directly address what is documented in workflow requirements and display capabilities that may connect algorithmic innovation with clinical utility once validated through prospective clinical trials.

摘要翻译：

**背景/目的：** 医学图像分割是癌症诊断、治疗规划和监测的关键任务；然而，它仍然是基于人工智能（AI）的临床应用中最难攻克的难题之一。深度学习模型在单一器官计算机断层扫描（CT）分割等狭窄任务中已显示出近乎完美的结果，但在实际应用中却表现不佳，而实际应用中对跨模态鲁棒性和多器官勾画的要求至关重要（例如，在CT-MR联合场景中，肝脏的Dice系数降至0.88 ± 0.15）。这种脆弱性暴露了两个结构性差距：（i）僵化的任务特定架构，其灵活性不足以适应各种临床指令；（ii）假设存在一个在所有癌症成像应用中都是最优的通用损失函数。 **方法：** 本文提出了一种新颖的多模态分割框架，该框架通过特征级线性调制（FiLM）和条件批量归一化，将自然语言提示与高保真成像特征相结合，使单一模型能够动态适应不同模态、器官和病理。与先前系统不同，所提出的方法是提示驱动、上下文感知且可端到端训练的，以确保计算适应性与临床决策之间的一致性。 **结果：** 在脑肿瘤数据集（癌症相关神经影像）和CHAOS多器官挑战赛上的广泛评估揭示了两个关键见解：（1）Dice损失对于单器官任务仍然是最优的；（2）在多器官、跨模态划分癌症分割边界时，Jaccard（IoU）损失表现更优。因此，实证证据表明，损失函数的最优性是任务和上下文依赖的，而非普适的。 **结论：** 该设计框架的原则直接针对工作流需求中记录的内容，并展示了将算法创新与临床效用连接起来的能力，一旦通过前瞻性临床试验验证，即可实现。

原文链接：

Prompt-Driven Multimodal Segmentation with Dynamic Fusion for Adaptive and Robust Medical Imaging with Applications to Cancer Diagnosis

……

文章目录

文章：

基于提示驱动的动态融合多模态分割技术：在癌症诊断中实现自适应与稳健的医学影像分析

Prompt-Driven Multimodal Segmentation with Dynamic Fusion for Adaptive and Robust Medical Imaging with Applications to Cancer Diagnosis

原文发布日期：18 November 2025

DOI: 10.3390/cancers17223691

类型: Article

开放获取: 是

英文摘要：

摘要翻译：

原文链接：

相关文章

关于我们

官方邮箱

商务合作