Background/Objectives: This study examines the effectiveness of different resampling methods and classifier models for handling imbalanced datasets, with a specific focus on critical healthcare applications such as cancer diagnosis and prognosis. Methods: To address the class imbalance issue, traditional sampling methods like SMOTE and ADASYN were replaced by Generative Adversarial Networks (GANs), which leverage deep neural network architectures to generate high-quality synthetic data. The study highlights the advantage of GANs in creating realistic, diverse, and homogeneous samples for the minority class, which plays a significant role in mitigating the diagnostic challenges posed by imbalanced data. Four types of classifiers, Boosting, Bagging, Linear, and Non-linear, were assessed to evaluate their performance using metrics such as accuracy, precision, recall, F1 score, and ROC AUC. Results: Baseline performance without resampling showed significant limitations, underscoring the need for resampling strategies. Using GAN-generated data notably improved the detection of minority instances and overall classification performance. The average ROC AUC value increased from baseline levels of approximately 0.8276 to over 0.9734, underscoring the effectiveness of GAN-based resampling in enhancing model performance and ensuring more balanced detection across classes. With GAN-based resampling, GradientBoosting classifier achieved a ROC AUC of 0.9890, the highest among all models, demonstrating the effectiveness of GAN-generated data in enhancing performance. Conclusions: The findings underscore that advanced models like Boosting and Bagging, when paired with effective resampling strategies such as GANs, are better suited for handling imbalanced datasets and improving predictive accuracy in healthcare applications.

摘要翻译：

背景/目的：本研究旨在评估不同重采样方法与分类器模型在处理不平衡数据集方面的有效性，特别聚焦于癌症诊断与预后等关键医疗应用场景。方法：为解决类别不平衡问题，研究采用生成对抗网络替代传统采样方法（如SMOTE和ADASYN），利用深度神经网络架构生成高质量合成数据。研究重点探讨了生成对抗网络在创建真实、多样且同质化的少数类样本方面的优势，这对缓解不平衡数据带来的诊断挑战具有重要意义。通过准确率、精确率、召回率、F1分数和ROC AUC等指标，系统评估了Boosting、Bagging、线性和非线性四类分类器的性能。结果：未使用重采样方法的基线模型表现出明显局限性，凸显了重采样策略的必要性。采用生成对抗网络生成的数据显著提升了少数类样本的检测能力与整体分类性能，平均ROC AUC值从基线水平的约0.8276提升至0.9734以上，证实了基于生成对抗网络的重采样方法在增强模型性能、确保跨类别平衡检测方面的有效性。其中，结合生成对抗网络重采样的梯度提升分类器取得了0.9890的ROC AUC值，在所有模型中表现最优，充分证明了生成对抗网络生成数据对性能提升的促进作用。结论：研究结果表明，当Boosting和Bagging等先进模型与生成对抗网络等高效重采样策略结合时，能更有效地处理不平衡数据集，提升医疗应用场景中的预测准确性。

原文链接：

Synthetic Boosted Resampling Using Deep Generative Adversarial Networks: A Novel Approach to Improve Cancer Prediction from Imbalanced Datasets

……

文章目录

文章：

基于深度生成对抗网络的合成增强重采样：一种改进不平衡数据集癌症预测的新方法

Synthetic Boosted Resampling Using Deep Generative Adversarial Networks: A Novel Approach to Improve Cancer Prediction from Imbalanced Datasets

原文发布日期：2 December 2024

DOI: 10.3390/cancers16234046

类型: Article

开放获取: 是

英文摘要：

摘要翻译：

原文链接：

相关文章

关于我们

官方邮箱

商务合作