Performance Evaluation of an Ensemble Method for Diagnosis of Chronic Kidney Disease with Feature Selection Technique
Loading...
Date
2020
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE Xplore: 2020 International Conference on Decision Aid Sciences and Application (DASA)
Abstract
Chronic Kidney Disease (CKD) is a public health issue which is seen as a significant threat to human life
due to abnormal functioning of kidney over a period of months or years which, if left untreated, may damage vital organs in the body leading to an increased rate in cardiovascular mortality which may result in sudden death if not early detected. Data mining techniques are employed in several clinical diagnoses for making intelligent diagnostics decisions that can be applied in disease prediction. The performances of these techniques are very promising in the management of different ailments to reduce the high numbers of people that die yearly due to inaccurate diagnosis of numerous disease conditions. This study evaluates the performance of a bagging ensemble technique on CKD dataset with an effective feature selection technique to yield a reliable and accurate predictive model capable of correctly classifying diseased from non-diseased patients. The study was investigated on a real patient dataset obtained from the UCI machine learning repository consisting of 400 instances with 24 conditional attributes and a decisional class. Radom forest algorithm was used as a measure to select the best subset of features for the predictive models. Naïve Bayes, k-Nearest Neighbor, and Decision Tree algorithms serve as the base classifiers whose performance were aggregated using the bagging ensemble approach to improve base learners' performances. Results obtained from the study showed the effect of feature selection and ensemble technique in improving the accuracy of data mining classification algorithms. The model's optimal result is achieved using 7 best-selected features on the ensemble classifier with 100% accuracy of CKD diagnosis compared to 98.3% accuracy without feature selection. Hence, making the model suitable for efficient diagnosis of CKD.
Description
Keywords
Bagging, Chronic kidney disease, Ensemble learning, Random forest, Feature selection