Minimal Loss Function Determination on Four Machine Learning Algorithms using Chronic Myeloid Leukemia Cancer Dataset.

Abstract
In Artificial intelligence researches, machine learning (ML) algorithms are used to extract meaningful information from the datasets to aid prediction and in some cases diagnosis. The determination of loss function on the chosen machine learning algorithm(s) is discovered to be deficient in grouping or stratification of datasets. This paper used dataset of 1640 Chronic Myeloid Leukemia patients from Obafemi Awolowo University Teaching Hospitals Complex, IleIfe, Osun Sate, Nigeria. An experimental analysis was performed in Waikato Environment for Knowledge Analysis 3.8.0 using basophil count and spleen size values on four ML algorithms (BayesNet, Multilayered perceptron, Projective Adaptive Resonance Theory (PART) and Logistic Regression) to determine low and high risk patients. Two validation techniques (Holdout and 10-fold cross-validation) were used to evaluate the performance of the algorithms on correctly classified instances, time to learn, kappa statistics, sensitivity and specificity. Two algorithms (Logistic regression and PART) showed leading performances in stratifying the dataset; the loss function was minimized by finding the difference between the true output 𝓇 and the predicted output 𝓇̂. The results of the loss function of Logistic regression algorithm for low and high risk in holdout and 10-fold cross-validation showed 0.22%, 1.40% and -0.22%, -0.02% respectively. Similarly, PART algorithm yielded -1.58%, 1.40% and -0.22%, -0.26%. From the findings, Logistic regression algorithm had the loss function with minimum value in holdout technique. Therefore, the determination of minimal loss function is of high importance as it would enhance the choice of the algorithm to be used in grouping of dataset.
Description
Keywords
Artificial intelligence, Loss function, Data grouping, Empirical risk minimization, Machine learning
Citation