Background: When obtaining specimens from pulmonary nodules in TBLB, distinguishing between benign samples and mis-sampling from a tumor presents a challenge. Our objective is to develop a machine-learning-based classifier for TBLB specimens. Methods: Three pathologists assessed six pathological findings, including interface bronchitis/bronchiolitis (IB/B), plasma cell infiltration (PLC), eosinophil infiltration (Eo), lymphoid aggregation (Ly), fibroelastosis (FE), and organizing pneumonia (OP), as potential histologic markers to distinguish between benign and malignant conditions. A total of 251 TBLB cases with defined benign and malignant outcomes based on clinical follow-up were collected and a gradient-boosted decision-tree-based machine learning model (XGBoost) was trained and tested on randomly split training and test sets. Results: Five pathological changes showed independent, mild-to-moderate associations (AUC ranging from 0.58 to 0.75) with benign conditions, with IB/B being the strongest predictor. On the other hand, FE emerged to be the sole indicator of malignant conditions with a mild association (AUC = 0.66). Our model was trained on 200 cases and tested on 51 cases, achieving an AUC of 0.78 for the binary classification of benign vs. malignant on the test set. Conclusion: The machine-learning model developed has the potential to distinguish between benign and malignant conditions in TBLB samples excluding the presence or absence of tumor cells, thereby improving diagnostic accuracy and reducing the burden of repeated sampling procedures for patients.
背景:经支气管肺活检(TBLB)获取肺结节标本时,如何区分良性样本与肿瘤取样失误是一大挑战。本研究旨在开发一种基于机器学习的TBLB标本分类器。方法:三位病理学家评估了六种病理学表现作为潜在组织学标志物,包括界面性支气管炎/细支气管炎(IB/B)、浆细胞浸润(PLC)、嗜酸性粒细胞浸润(Eo)、淋巴细胞聚集(Ly)、弹力纤维增生(FE)和机化性肺炎(OP),以区分良恶性病变。共收集251例经临床随访明确良恶性结局的TBLB病例,采用基于梯度提升决策树的机器学习模型(XGBoost)在随机划分的训练集和测试集上进行训练与验证。结果:五种病理改变与良性病变存在独立、轻度至中度的相关性(AUC值范围0.58-0.75),其中IB/B预测效力最强。而FE成为唯一与恶性病变轻度相关的指标(AUC=0.66)。本研究模型在200例训练集上训练,在51例测试集上验证,对良恶性二分类的测试集AUC达到0.78。结论:本研究开发的机器学习模型有望在不依赖肿瘤细胞存在与否的情况下,有效区分TBLB样本的良恶性,从而提高诊断准确性,减轻患者重复取样负担。