Background/Objectives: Medical image segmentation is a crucial task for diagnosis, treatment planning, and monitoring of cancer; however, it remains one of the toughest nuts to crack for Artificial Intelligence (AI)-based clinical applications. Deep-learning models have shown near-perfect results for narrow tasks such as single-organ Computed Tomography (CT) segmentation. Still, they fail to deliver under practicality, in which cross-modality robustness and multi-organ delineation are essential (e.g., liver Dice dropping to 0.88 ± 0.15 in combined CT-MR scenarios). That fragility exposes two structural gaps: (i) rigid task-specific architectures, which are not flexible enough to adapt to various clinical instructions, and (ii) the assumption that a universal loss function is best in all cancer imaging applications. Methods: A novel multimodal segmentation framework is proposed that combines natural language prompts and high-fidelity imaging features through Feature-wise Linear Modulation (FiLM) and Conditional Batch Normalization, enabling a single model to adapt dynamically across modalities, organs, and pathologies. Unlike preceding systems, the proposed approach is prompt-driven, context-aware, and end-to-end trainable to ensure alignment between computational adaptability and clinical decision-making. Results: Extensive evaluation on the Brain Tumor Dataset (cancer-relevant neuroimaging) and the CHAOS multi-organ challenge demonstrates two key insights: (1) while Dice loss remains optimal for single-organ tasks, (2) Jaccard (IoU) loss outperforms when multi-organ, cross-modality divides cancer segmentation boundaries. Empirical evidence has thus been offered that optimality of a loss function is task- and context-dependent and not universal. Conclusions: The design framework’s principles directly address what is documented in workflow requirements and display capabilities that may connect algorithmic innovation with clinical utility once validated through prospective clinical trials.
**背景/目的:** 医学图像分割是癌症诊断、治疗规划和监测的关键任务;然而,它仍然是基于人工智能(AI)的临床应用中最难攻克的难题之一。深度学习模型在单一器官计算机断层扫描(CT)分割等狭窄任务中已显示出近乎完美的结果,但在实际应用中却表现不佳,而实际应用中对跨模态鲁棒性和多器官勾画的要求至关重要(例如,在CT-MR联合场景中,肝脏的Dice系数降至0.88 ± 0.15)。这种脆弱性暴露了两个结构性差距:(i)僵化的任务特定架构,其灵活性不足以适应各种临床指令;(ii)假设存在一个在所有癌症成像应用中都是最优的通用损失函数。 **方法:** 本文提出了一种新颖的多模态分割框架,该框架通过特征级线性调制(FiLM)和条件批量归一化,将自然语言提示与高保真成像特征相结合,使单一模型能够动态适应不同模态、器官和病理。与先前系统不同,所提出的方法是提示驱动、上下文感知且可端到端训练的,以确保计算适应性与临床决策之间的一致性。 **结果:** 在脑肿瘤数据集(癌症相关神经影像)和CHAOS多器官挑战赛上的广泛评估揭示了两个关键见解:(1)Dice损失对于单器官任务仍然是最优的;(2)在多器官、跨模态划分癌症分割边界时,Jaccard(IoU)损失表现更优。因此,实证证据表明,损失函数的最优性是任务和上下文依赖的,而非普适的。 **结论:** 该设计框架的原则直接针对工作流需求中记录的内容,并展示了将算法创新与临床效用连接起来的能力,一旦通过前瞻性临床试验验证,即可实现。