Objective:Cervical cancer ranks among the top causes of death among females in developing countries. The most important procedures that should be followed to guarantee the minimizing of cervical cancer’s aftereffects are early identification and treatment under the finest medical guidance. One of the best methods to find this sort of malignancy is by looking at a Pap smear image. For automated detection of cervical cancer, the available datasets often have missing values, which can significantly affect the performance of machine learning models.Methods:To address these challenges, this study proposes an automated system for predicting cervical cancer that efficiently handles missing values with SMOTE features to achieve high accuracy. The proposed system employs a stacked ensemble voting classifier model that combines three machine learning models, along with KNN Imputer and SMOTE up-sampled features for handling missing values.Results:The proposed model achieves 99.99% accuracy, 99.99% precision, 99.99% recall, and 99.99% F1 score when using KNN imputed SMOTE features. The study compares the performance of the proposed model with multiple other machine learning algorithms under four scenarios: with missing values removed, with KNN imputation, with SMOTE features, and with KNN imputed SMOTE features. The study validates the efficacy of the proposed model against existing state-of-the-art approaches.Conclusions:This study investigates the issue of missing values and class imbalance in the data collected for cervical cancer detection and might aid medical practitioners in timely detection and providing cervical cancer patients with better care.
目的:宫颈癌是发展中国家女性主要致死原因之一。要最大限度降低宫颈癌后遗症,关键在于早期发现并在最佳医疗指导下进行治疗。通过巴氏涂片图像检测是发现此类恶性肿瘤的最佳方法之一。在宫颈癌自动检测中,现有数据集常存在缺失值,这会显著影响机器学习模型的性能。 方法:为解决这些问题,本研究提出一种自动预测宫颈癌的系统,该系统能通过SMOTE特征有效处理缺失值以实现高精度检测。所提出的系统采用堆叠集成投票分类器模型,该模型融合了三种机器学习算法,并结合KNN插补法和SMOTE上采样特征处理缺失值。 结果:当使用KNN插补的SMOTE特征时,该模型实现了99.99%的准确率、99.99%的精确率、99.99%的召回率和99.99%的F1分数。研究在四种场景下将所提模型与多种其他机器学习算法进行比较:删除缺失值、KNN插补处理、SMOTE特征处理以及KNN插补的SMOTE特征处理。研究通过与现有先进方法对比验证了所提模型的有效性。 结论:本研究探讨了宫颈癌检测数据收集中存在的缺失值和类别不平衡问题,该成果可能有助于医疗从业者及时检测宫颈癌,并为宫颈癌患者提供更优质的诊疗服务。