This study modifies the U-Net architecture for pixel-based segmentation to automatically classify lesions in laryngeal endoscopic images. The advanced U-Net incorporates five-level encoders and decoders, with an autoencoder layer to derive latent vectors representing the image characteristics. To enhance performance, a WGAN was implemented to address common issues such as mode collapse and gradient explosion found in traditional GANs. The dataset consisted of 8171 images labeled with polygons in seven colors. Evaluation metrics, including the F1 score and intersection over union, revealed that benign tumors were detected with lower accuracy compared to other lesions, while cancers achieved notably high accuracy. The model demonstrated an overall accuracy rate of 99%. This enhanced U-Net model shows strong potential in improving cancer detection, reducing diagnostic errors, and enhancing early diagnosis in medical applications.
本研究对基于像素分割的U-Net架构进行改进,以实现喉镜图像中病变的自动分类。该改进型U-Net采用五级编码器-解码器结构,并通过自编码器层提取表征图像特征的潜在向量。为提升模型性能,研究引入Wasserstein生成对抗网络以解决传统GAN中常见的模式崩溃和梯度爆炸问题。数据集包含8171张标注有七色多边形标签的图像。通过F1分数和交并比等评估指标发现,良性肿瘤的检测准确率低于其他病变类型,而恶性肿瘤的识别准确率显著较高。模型整体准确率达到99%。这种增强型U-Net模型在提升癌症检测效率、减少诊断误差以及促进医疗早期诊断方面展现出巨大潜力。