U-Net, based on a deep convolutional network (CNN), has been clinically used to auto-segment normal organs, while still being limited to the planning target volume (PTV) segmentation. This work aims to address the problems in two aspects: 1) apply one of the newest network architectures such as vision transformers other than the CNN-based networks, and 2) find an appropriate combination of network hyper-parameters with reference to recently proposed nnU-Net (“no-new-Net”). VT U-Net was adopted for auto-segmenting the whole pelvis prostate PTV as it consisted of fully transformer architecture. The upgraded version (v.2) applied the nnU-Net-like hyper-parameter optimizations, which did not fully cover the transformer-oriented hyper-parameters. Thus, we tried to find a suitable combination of two key hyper-parameters (patch size and embedded dimension) for 140 CT scans throughout 4-fold cross validation. The VT U-Net v.2 with hyper-parameter tuning yielded the highest dice similarity coefficient (DSC) of 82.5 and the lowest 95% Haussdorff distance (HD95) of 3.5 on average among the seven recently proposed deep learning networks. Importantly, the nnU-Net with hyper-parameter optimization achieved competitive performance, although this was based on the convolution layers. The network hyper-parameter tuning was demonstrated to be necessary even for the newly developed architecture of vision transformers.
基于深度卷积网络(CNN)的U-Net已在临床上用于自动分割正常器官,但在计划靶区(PTV)分割方面仍存在局限。本研究旨在从两方面解决该问题:一是采用视觉变换器等新型网络架构替代基于CNN的网络;二是参考近期提出的nnU-Net("无需新网络")框架寻找合适的网络超参数组合。本研究采用全变换器架构的VT U-Net实现全盆腔前列腺PTV的自动分割。升级版本(v.2)应用了类nnU-Net的超参数优化方案,但未完全涵盖面向变换器的超参数设置。为此,我们通过四折交叉验证对140例CT扫描数据,尝试寻找两个关键超参数(图像块尺寸与嵌入维度)的最佳组合。经超参数调优的VT U-Net v.2在七种最新提出的深度学习网络中取得最佳性能,平均骰子相似系数(DSC)达82.5,95%豪斯多夫距离(HD95)降至3.5。值得注意的是,基于卷积层的nnU-Net经过超参数优化后也展现出具有竞争力的性能。研究表明,即使对于新开发的视觉变换器架构,网络超参数调优仍是必要的技术环节。