An Early Screening Method of Chronic Kidney Disease Based on Ensemble Learning Algorithm

Yu-ping JIANG; Cheng YU; Yan-rong LIN; Hai-yan SI; Di LIU; Jiang ZHU; Hao WANG; Hao CHEN

doi:10.13718/j.cnki.xdzk.2020.10.003

2020 Volume 42 Issue 10

Article Contents

Previous Article Next Article

Yu-ping JIANG, Cheng YU, Yan-rong LIN, et al. An Early Screening Method of Chronic Kidney Disease Based on Ensemble Learning Algorithm[J]. Journal of Southwest University Natural Science Edition, 2020, 42(10): 17-24. doi: 10.13718/j.cnki.xdzk.2020.10.003

Citation:

Yu-ping JIANG, Cheng YU, Yan-rong LIN, et al. An Early Screening Method of Chronic Kidney Disease Based on Ensemble Learning Algorithm[J]. Journal of Southwest University Natural Science Edition, 2020, 42(10): 17-24. doi: 10.13718/j.cnki.xdzk.2020.10.003

An Early Screening Method of Chronic Kidney Disease Based on Ensemble Learning Algorithm

1.
ShenTaiWang Healthcare Technology Limited Company, Nanjing 210023, China
2.
National Key Laboratory for Novel Software Technology at Nanjing University, Nanjing University, Nanjing 210023, China

More Information

Received Date: 30/09/2020
Available Online: 20/10/2020
MSC: TP391

Abstract

Chronic kidney disease, with its high incidence and low awareness, is a common disease that seriously endangers human health. The early screening method of chronic kidney disease based on the ensemble learning algorithm can improve the awareness rate of kidney disease and is conducive to early detection and early treatment. In a study reported herein, the medical examination data of many hospitals from 2016 to 2019 were collected, the examinees who had progressed to chronic kidney disease within three years were selected as the research subjects, and the examinees who had not progressed to chronic kidney disease within three years were taken as the control group. Through 5-fold cross-validation, python 3.7 was used to train and test the random forest and XGBoost algorithm models, and their predictive effect was compared based on the F1-score, and true positive and true negative indicators of the outcome of chronic kidney disease. The prediction effect of the random forest algorithm model was that the true positive rate was 0.950, the true negative rate was 0.969 and the F1-score was 0.957; while that of the XGBoost algorithm model was that the true positive rate was 0.966, the true negative rate was 0.955 and the F1-score was 0.958.
- chronic kidney disease,
- early screening,
- ensemble learning,
- random forest,
- XGBoost,
- cross validation

References

[1]	ENE-IORDACHE B, PERICO N, BIKBOV B, et al. Chronic Kidney Disease and Cardiovascular Risk in Six Regions of the World (ISN-KDDC): a Cross-Sectional Study [J]. Lancet Glob Health, 2016, 4(5): e307-e319. doi: 10.1016/S2214-109X(16)00071-1 CrossRef Google Scholar
[2]	ZHANG L, WANG F, WANG L, et al. Prevalence of Chronic Kidney Disease in China: a Cross-Sectional Survey [J]. Lancet, 2012, 379(9818): 815-822. doi: 10.1016/S0140-6736(12)60033-6 CrossRef Google Scholar
[3]	上海慢性肾脏病早发现及规范化诊治与示范项目专家组, 高翔, 梅长林.慢性肾脏病筛查诊断及防治指南[J].中国实用内科杂志, 2017, 37(1): 28-34. Google Scholar
[4]	洪烨.基于机器学习算法的糖尿病预测模型研究[D].哈尔滨: 哈尔滨工业大学, 2016. Google Scholar
[5]	周悦玲, 蒋更如. IgA肾病进展至终末期肾病临床预测的研究现状[J].上海交通大学学报(医学版), 2016, 36(2): 296-301. Google Scholar
[6]	刘迷迷, 蔡永铭.基于多层感知神经网络的糖尿病并发症预测研究[J].软件, 2018, 39(10): 30-35. Google Scholar
[7]	郑晓燕.基于机器学习的心血管疾病预测系统研究[D].北京: 北京交通大学, 2018. Google Scholar
[8]	刘璐.基于机器学习的小于胎龄儿预测模型的研究[D].北京: 北京工业大学, 2017. Google Scholar
[9]	周超.基于机器学习的感知信号分类与预测方法研究[D].成都: 电子科技大学, 2018. Google Scholar
[10]	方育柯, 傅彦, 周俊临.基于集成学习的个性化推荐算法[J].计算机工程与应用, 2011, 47(10): 1-4. Google Scholar
[11]	谭言丹, 赵阳洋, 赵光财.基于AdaBoost特征选择和XGBoost的帕金森病诊断[J].信息技术, 2020, 44(9): 124-128. Google Scholar
[12]	侯勇, 郑雪峰.集成学习算法的研究与应用[J].计算机工程与应用, 2012, 48(34): 17-22. Google Scholar
[13]	李勇, 刘战东, 张海军.不平衡数据的集成分类算法综述[J].计算机应用研究, 2014, 31(5): 1287-1291. Google Scholar
[14]	李明峰, 贾修一.基于多分类器集成学习的中文反语识别技术[J].计算机与数字工程, 2018, 46(9): 1790-1795. Google Scholar
[15]	刘毅.基于集成学习算法的冠心病早期筛查方法研究[D].济南: 山东大学, 2018. Google Scholar
[16]	黄颖坤, 金炜东, 余志斌, 吴昀璞.基于深度学习和集成学习的辐射源信号识别[J].系统工程与电子技术, 2018, 40(11): 2420-2425. Google Scholar
[17]	李俊磊.多组合分类器在局部区域气温预测中的研究与应用[D].广州: 广东工业大学, 2014. Google Scholar
[18]	FAWCETT T. An Introduction to ROC Analysis [J]. Pattern Recognition Letters, 2006, 27(8): 861-874. doi: 10.1016/j.patrec.2005.10.010 CrossRef Google Scholar
[19]	BREIMAN L. Bagging Predicators [J]. Machine Learning, 1996, 24(2): 123-140. Google Scholar
[20]	BREIMAN L. Random Forests [J]. Machine Learning, 2001, 45(1): 5-32. Google Scholar
[21]	CHEN T, GUESTRIN C. XGBoost: A Scalable Tree Boosting System [C] //Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco California USA. New York, NY, USA ACM, 2016: 785-794. Google Scholar
[22]	范永东.模型选择中的交叉验证方法综述[D].太原: 山西大学, 2013. Google Scholar
[23]	NICULESCUMIZIL A, CARUANA R. Predicting Good Probabilities with Supervised Learning [C] //International Conference on Machine Learning, ICML'05, August 7-11, 2005. Bonn, Germany. New York, USA: ACM Press, 2005: 625-632. Google Scholar
[24]	DEGROOT M H, FIENBERG S E. The Comparison and Evaluation of Forecasters [J]. Journal of the Royal Statistical Society: Series D (The Statistician), 1983, 32(1-2): 12-22. Google Scholar

Access History

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(3) / Tables(4)

Export Citation

PDF

XML

Article Metrics

Article views(1274) PDF downloads(373) Cited by(0)

Access History

Other Articles By Authors

on this site
- Yu-ping JIANG
- Cheng YU
- Yan-rong LIN
- Hai-yan SI
- Di LIU
- Jiang ZHU
- Hao WANG
- Hao CHEN
on Google Scholar
- Yu-ping JIANG
- Cheng YU
- Yan-rong LIN
- Hai-yan SI
- Di LIU
- Jiang ZHU
- Hao WANG
- Hao CHEN

HTML

慢性肾病具有患病率高、知晓率低、预后差和医疗费用高等特点，是继心脑血管疾病、糖尿病和恶性肿瘤之后，又一严重危害人类健康的疾病.近年来慢性肾病患病率逐年上升，全球一般人群患病率已高达14.3%^[1].我国横断面流行病学研究显示^[2]，18岁以上人群慢性肾病患病率为10.8%，据此估计我国现有成年慢性肾脏病患者1.5亿，但知晓率仅为12.5%，该调查还发现经济快速发展的农村地区居民成为慢性肾脏病的高发人群.随着我国人口老龄化、糖尿病和高血压等疾病的发病率逐年增高，慢性肾病发病率也呈现不断上升之势^[3].由此可见对慢性肾病早期筛查的重要性.随着人工智能技术的发展，越来越多的研究者将其应用到医疗卫生领域^[4-9].人工神经网络、支持向量机、决策树等机器学习方法可以实现分类功能，并在疾病的风险预测方面得到应用.而使用集成学习方法比单个机器学习方法构建的分类器性能表现更优^[10-13].使用集成学习方法已经在各个领域实现图像识别、语义识别、疾病筛查、辐射源信号识别、天气预测等功能^[14-17].基于集成学习算法的慢性肾病早期筛查方法在医疗领域具有重要价值.

3. 结论

本研究基于随机森林与XGBoost集成学习算法创建慢性肾病早期筛查方法，使用随机森林算法训练得到的筛查模型精确率、真阳性率、真阴性率和F1值分别为0.964，0.950，0.969，0.957，XGBoost算法的分别为0.950，0.966，0.955，0.958.其中随机森林算法的精确率与真阴性率较高，XGBoost算法的真阳性率与F1值较高.总体来讲，2种集成学习算法筛查模型性能相当，可以根据不同的筛查需求来选择.该慢性肾病早期筛查方法在应用过程中，2个模型共同筛查得到的阳性结果就可以判定为阳性.

慢性肾病筛查最终得出的结果是患者发展为慢性肾病的风险概率值，而分类模型直接输出的分数值并不能直接视为风险预测的概率值，需要评估出当前模型的输出结果与真实结果的偏差是否在允许的范围内，必要的时候需要对其结果进行校准，因此选用概率校准方法解决这个问题.本文使用Platt scaling概率校准方法校准后的模型性能存在一定程度的下降，但是均高于0.94.

由于给出的数据并不知道患者患慢性肾病的真实概率值，无法直接判断原模型的输出是否为有效估计，一种简单而普适的方法即绘制reliability图，图线越接近对角线，说明模型的概率估计越有效，若超出预期范围，可以采用Platt scaling概率校准方法来降低原分类模型的偏差，使最终输出值更接近真实概率，经过概率校准处理后使原模型最终的输出是有效的估计值.

综上，基于随机森林、XGBoost集成学习算法的慢性肾病早期筛查方法的预测效果均表现良好且稳定.采用Platt scaling概率校准方法进行模型概率校准并没有过多的改变分类性能，只是提升了原模型对慢性肾病风险概率估计的可靠性，因此概率校准后输出的概率值更具临床参考价值.基于集成学习算法的慢性肾病早期筛查方法可以应用于医院、体检中心、社区、保险公司及移动平台等辅助体检人员的慢性肾病早期筛查.

Figure (3) Table (4) Reference (24)

Name
	Name cannot be empty!
E-mail
	Mailbox cannot be empty! Mailbox cannot be empty!
Telephone
	Mobile number cannot be empty! Please enter a valid mobile number!
Title

Content
Verification Code

Message Board

An Early Screening Method of Chronic Kidney Disease Based on Ensemble Learning Algorithm

Abstract

References

Access History

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Access History

Other Articles By Authors