Development of a Clinical Predictive Model for Stratification of Cancerous Diseases: A Case Study of Chronic Myeloid Leukemia

Loading...
Thumbnail Image
Date
2020
Journal Title
Journal ISSN
Volume Title
Publisher
International Journal of Advanced Science and Technology
Abstract
Scoring systems are typically used to stratify Chronic Myeloid Leukemia (CML) disease into their risk groups towards cure and survival prolongation. These systems, however, do not computationally handle very large datasets due to noise and overfitting of data. In literature, Machine Learning (ML) algorithms have been used to extract meaningful information from datasets, and their performances measured based on metrics such as accuracy and time to learn, among others. Nevertheless, the loss function (empirical risk) of the ML algorithms has been found not to have been largely considered to determine the risks incurred in adopting the ML algorithms for stratification. The aim of this study was to develop an Empirical Risk Minimization Data Stratification (ERMDS) algorithm that can aid the stratification of Chronic Myeloid Leukemia dataset. The algorithm developed would aid the development of a clinical predictive model using an application called ChroMyL app. A secondary dataset of 1640 CML patients, between 2003 and 2017 was collected from Obafemi Awolowo University Teaching Hospitals Complex, Ile-Ife, Osun State, Nigeria, and mined in WEKA 3.8.0 using basophil count and spleen size values on four ML algorithms (BayesNet, Multilayered perceptron, Projective Adaptive Resonance Theory (PART) and Logistic Regression). The algorithm with the highest performance was used in developing the ERMDS algorithm. Based on the analysis of the four classification algorithms carried out on five performance metrics which are: correctly classified instance, time to learn, kappa statistics, sensitivity and specificity, Logistic Regression had the highest accuracy value of 99.82%. As such, the ERMDS algorithm was developed using L1-regularized logistic regression solver in LibLINEAR 2.20. A Clinical Predictive Model (deployed as, ChroMyL app) was implemented with Javascript scripting language and jQuery on Macromedia Dreamweaver 16.0 to enhance page interactivity. The findings provided better insight into the process of adopting empirical risk minimization techniques in machine learning algorithms to solve disease risk group stratification problems, thus revealing how machine learning algorithms can be applied to real-world problems. The outcome of this study would provide more insight into the theoretical foundations of ML, and the important factors that must be put into consideration in every predictive or stratification models. Future researches can focus more on determining the loss function of other machine learning algorithms used in stratifying the chronic myeloid leukemia disease. Also, the approach to the design of the clinical predictive model application called ChroMyL app could be used for related cases.
Description
Keywords
Classification, Clinical predictive model, Empirical risk minimization, Logistic regression, Machine learning, Stratification
Citation