Background:Small cell lung cancer (SCLC) is an extremely aggressive form of lung cancer, characterized by rapid progression and poor survival rates. Despite the importance of early diagnosis, the current diagnostic techniques are invasive and restricted.Methods:This study presents a novel stacking-based ensemble machine learning approach for classifying small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) using metabolomics data. The analysis included 191 SCLC cases, 173 NSCLC cases, and 97 healthy controls. Feature selection techniques identified significant metabolites, with positive ions proving more relevant.Results:For multi-class classification (control, SCLC, NSCLC), the stacking ensemble achieved 85.03% accuracy and 92.47 AUC using Support Vector Machine (SVM). Binary classification (SCLC vs. NSCLC) further improved performance, with ExtraTreesClassifier reaching 88.19% accuracy and 92.65 AUC. SHapley Additive exPlanations (SHAP) analysis revealed key metabolites like benzoic acid, DL-lactate, and L-arginine as significant predictors.Conclusions:The stacking ensemble approach effectively leverages multiple classifiers to enhance overall predictive performance. The proposed model effectively captures the complementary strengths of different classifiers, enhancing the detection of SCLC and NSCLC. This work accentuates the potential of combining metabolomics with advanced machine learning for non-invasive early lung cancer subtype detection, offering an alternative to conventional biopsy methods.
背景:小细胞肺癌(SCLC)是一种极具侵袭性的肺癌类型,其特点是进展迅速且生存率低。尽管早期诊断至关重要,但目前的诊断技术具有侵入性且应用受限。 方法:本研究提出了一种基于堆叠的集成机器学习新方法,利用代谢组学数据对小细胞肺癌(SCLC)和非小细胞肺癌(NSCLC)进行分类。分析共纳入191例SCLC患者、173例NSCLC患者及97例健康对照者。通过特征选择技术识别出显著代谢物,其中阳离子代谢物显示出更高相关性。 结果:在多分类任务(对照组、SCLC、NSCLC)中,采用支持向量机(SVM)的堆叠集成模型获得85.03%的准确率和92.47的AUC值。在二分类任务(SCLC vs. NSCLC)中性能进一步提升,ExtraTrees分类器达到88.19%的准确率和92.65的AUC值。SHAP分析揭示苯甲酸、DL-乳酸和L-精氨酸等关键代谢物是重要预测因子。 结论:堆叠集成方法能有效整合多个分类器的优势以提升整体预测性能。该模型成功捕捉了不同分类器的互补优势,增强了SCLC与NSCLC的检测能力。本研究凸显了代谢组学与先进机器学习技术相结合在实现无创性早期肺癌亚型检测方面的潜力,为传统活检方法提供了替代方案。