Skin lesion segmentation plays a key role in the diagnosis of skin cancer; it can be a component in both traditional algorithms and end-to-end approaches. The quality of segmentation directly impacts the accuracy of classification; however, attaining optimal segmentation necessitates a substantial amount of labeled data. Semi-supervised learning allows for employing unlabeled data to enhance the results of the machine learning model. In the case of medical image segmentation, acquiring detailed annotation is time-consuming and costly and requires skilled individuals so the utilization of unlabeled data allows for a significant mitigation of manual segmentation efforts. This study proposes a novel approach to semi-supervised skin lesion segmentation using self-training with a Noisy Student. This approach allows for utilizing large amounts of available unlabeled images. It consists of four steps—first, training the teacher model on labeled data only, then generating pseudo-labels with the teacher model, training the student model on both labeled and pseudo-labeled data, and lastly, training the student* model on pseudo-labels generated with the student model. In this work, we implemented DeepLabV3 architecture as both teacher and student models. As a final result, we achieved a mIoU of 88.0% on the ISIC 2018 dataset and a mIoU of 87.54% on the PH2 dataset. The evaluation of the proposed approach shows that Noisy Student training improves the segmentation performance of neural networks in a skin lesion segmentation task while using only small amounts of labeled data.
皮肤病变分割在皮肤癌诊断中起着关键作用,它既可作为传统算法的组成部分,也可应用于端到端解决方案。分割质量直接影响分类的准确性,但获得最优分割结果需要大量标注数据。半监督学习能够利用未标注数据提升机器学习模型的效果。在医学图像分割领域,获取精细标注既耗时又昂贵,且需要专业人员操作,因此利用未标注数据可显著减少人工分割的工作量。本研究提出了一种基于噪声学生自训练的半监督皮肤病变分割新方法。该方法能够充分利用大量可获取的未标注图像,其流程包含四个步骤:首先仅使用标注数据训练教师模型,随后通过教师模型生成伪标签,接着结合标注数据与伪标签数据训练学生模型,最后利用学生模型生成的伪标签训练学生*模型。本研究采用DeepLabV3架构同时作为教师模型与学生模型。最终在ISIC 2018数据集上取得了88.0%的mIoU,在PH2数据集上达到87.54%的mIoU。评估结果表明,噪声学生训练方法能够在使用少量标注数据的情况下,有效提升神经网络在皮肤病变分割任务中的表现。