A Performance Study of Selected Machine Learning Techniques for Predicting Heart Diseases Blessing Oluwatobi Olorunfemi, Adewale Opeoluwa Ogunde, Abidemi Emmanuel Adeniyi, Joseph Bamidele Awotunde, Agbotiname Lucky Imoize, and Chun-Ta Li Abstract Heart Disease remains a leading cause of mortality worldwide. It alarm- ingly rises at a quick rate, making early heart disease prediction crucial for effective prevention and timely intervention. Heart disease diagnosis is a difficult process that requires technical skills and accuracy to complete. With improvements in tech- nology, computing has lent its voice to simplify the diagnosis of various health problems. Machine learning uses past or existing history to predict future results. Various machine learning techniques have been developed over the years and used in predicting heart diseases with various levels of performance. Identifying the best- suitedmachine learning technique to use for prediction purposes can be a challenging B. O. Olorunfemi · A. O. Ogunde Faculty of Natural Sciences, Department of Computer Science, Redeemer’s University, Osun, Nigeria e-mail: olorunfemi@run.edu.ng A. O. Ogunde e-mail: ogundea@run.edu.ng A. E. Adeniyi Department of Computer Science, Bowen University, Iwo, Nigeria e-mail: abidemi.adeniyi@bowen.edu.ng J. B. Awotunde Faculty of Information and Communication Sciences, Department of Computer Science, University of Ilorin, Ilorin 240003, Nigeria e-mail: awotunde.jb@unilorin.edu.ng A. L. Imoize Faculty of Engineering, Department of Electrical and Electronics Engineering, University of Lagos, Akoka, Lagos 100213, Nigeria e-mail: aimoize@unilag.edu.ng C.-T. Li (B) Bachelor’s Program of Artificial Intelligence and Information Security, Fu Jen Catholic University, New Taipei City 24206, Taiwan e-mail: 157278@mail.fju.edu.tw © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 G. A. Tsihrintzis et al. (eds.), Security and Information Technologies with AI, Internet Computing and Big-data Applications, Smart Innovation, Systems and Technologies 410, https://doi.org/10.1007/978-981-97-7786-0_27 331 http://crossmark.crossref.org/dialog/?doi=10.1007/978-981-97-7786-0_27&domain=pdf mailto:olorunfemi@run.edu.ng mailto:ogundea@run.edu.ng mailto:abidemi.adeniyi@bowen.edu.ng mailto:awotunde.jb@unilorin.edu.ng mailto:aimoize@unilag.edu.ng mailto:157278@mail.fju.edu.tw https://doi.org/10.1007/978-981-97-7786-0_27 332 B. O. Olorunfemi et al. task.This researchwork analyses the performanceof seven (7)machine learning tech- niques, comprising AdaBoost Algorithm, KNN, Logistic Regression, Naïve Bayes Classifier, Random Forest, SVM, and XGBoost. The heart disease dataset was down- loaded from the UCI repository and analysed using Python programming language in the Jupyter Notebook environment. A comparative analysis of the seven (7) tech- niques was performed based on Accuracy, Precision, and Recall. From the results obtained,KNN,RandomForest, andXGBoost showed superior performance over the others with an accuracy of 100%, AdaBoost Algorithm followed with an accuracy of 92.2%, SVM followed with an accuracy of 91.71%, Naïve Bayes Classifier followed with an accuracy of 88.29% while Logistic Regression has the least accuracy of 86.34%. KNN, RF, and XGBoost outperformed AdaBoost, SVN, and LR. Keywords Machine learning · Predictive model · Classification algorithm · Heart disease · Performance evaluation 1 Introduction Heart disease can occur in someone who does not feel exhausted. There are signs in a few populations with cardiac disease. This is when the body experiences changes or pain that indicate a disease is present. Several signs of cardiac disease include: a sternal ache, difficulty breathing, palpitations (an inclination that the heart is pulsating excessively quickly), swelling in the legs or feet, and feeling tired because the body and brain aren’t getting sufficient oxygen to oxygenate them. The author [1] defines cytosis as “blue skin.”Age, sex, cigarette smoking, actual idleness, drinking toomuch alcohol, overweight or obese weight, hereditary vulnerability and family history of heart disease, raised pulse (hypertension), raised glucose levels (diabetes mellitus), elevated blood cholesterol (hyperlipidemia), undiagnosed celiac infection, psychoso- cial factors, and genetic susceptibility variables, and air pollution are just a few of the risk factors for heart disease [2, 3]. A number of these risk variables, such as age, gender, and relatives’ ancestry/hereditary pattern, are unchangeable; nevertheless, many key cardiovascular risk variables can be changed through changes in lifestyles, changes in society, and medication (for example, hypertension, hyperlipidemia, and diabetes prevention) [4–6]. Obese people are more prone to develop atherosclerosis (i.e. fatty tissue, lipids, and other chemicals accumulate in and on the arterial walls). Ageing is an indicator component of getting cardiovascular or heart disorders,with the risk typically growing dramatically with each passing decade of life. According to estimates, 82% of people who pass away from heart disease are 65 years of age or older [4, 7–9]. The risk of stroke also doubles every ten years beyond the age of 55. Many explanations are put out to explain why ageing increases the risk of heart and circulatory infections. One of them claims to have high serum cholesterol. The blood absolute cholesterol level rises with ageing in many populations. This increase stops in men between the ages of 45 and 50. The increase in women continues sharply until they are 60–65 years old. Finding a heart problem may be the most difficult A Performance Study of Selected Machine Learning Techniques … 333 task in the medical services industry. To save lives, it must be quickly, effectively, and precisely examined. Numerous tests are required of the patient, and the results should be carefully examined by medical professionals. Because of this, scientists have been interested in predicting heart illness and have developed a variety of heart disease forecasting frameworks using various AI computations. Others, of them achieved the desired results, while some used data obtained from themany clinics that were available to them, others prepared and tested their classifier using the well-known UCI heart disease dataset. Therefore, cardiovascular disease prevention has grown more crucial than ever [10, 11]. Excellent information-driven frameworks for predicting heart diseases can enhance the entire examination and preventative measure, ensuring that more people can continue living healthy lives. AI assists in predicting heart diseases, and the predictions are quite accurate. It involves looking into the dataset of patients with heart disease and gathering the necessary data [12, 13]. At that point, several models were created, and predictions were generated usingKNN, SVM,XGboost, Naive Bayes, RandomForests, Logistic Regression, and Adaboost Algorithm [14]. This studymakes a new addition by offering a complete and up-to-date assessment of state-of-the-art machine learning strategies for heart disease prediction, as well as a comparative performance assessment of seven distinct approaches to machine learning on a real-world heart disease dataset [15]. The study identifies the best- performing machine learning techniques for heart disease prediction, which can be used by healthcare professionals to develop more accurate and reliable heart disease prediction systems [16, 17]. The remaining of this study is divided into five sections. Section 1 describes the review of related studies in the area of heart disease and machine learning. Section 2 gives the detailed methodology used to achieve the aim of this study. Section 3 gives the results and discussion using tables and graphs. Section 4 concludes the study. 2 Materials and Methods Cardiovascular disease is heightened as a silent killer that results in the untimely deaths of numerous persons without evident signs. The nature of the sickness is the source of growing concern about the illness and its effects. That is why it is highly germane to forecast the occurrence of this lethal illness. 2.1 Data Preprocessing In addition to the calculations used, the nature of the dataset and its methods also have an impact on the presentation and accuracy of the predictivemodel. This prepro- cessing step is crucial because it prepares the dataset and organizes it so that the computation may use it. The size range that the dataset has is another crucial factor. 334 B. O. Olorunfemi et al. Some datasets include many attributes, which makes it more difficult to analyse, discover instances of, or anticipate accurately. By looking at the dataset and applying the proper information preparation techniques, these problems may be resolved. Information cleansing, informationmodification, attribution ofmissing character- istics, information standardization, inclusion determination, and other advancements depending on the concept of the dataset are all examples of information preparation stages. Datasets may contain errors, omissions, redundancies, clamour, and a variety of other problems that make thematerial unsuitable for direct use byAI computation. 2.2 Model Architecture The model architecture shown in Fig. 1 includes all the various components and their interactions to implement a predictive model for the detection of cardiovascular diseases (see Table 1). The UCI Cleveland dataset, which is a publicly accessible dataset source, was utilized in this work as the heart disease data source. It was obtained from the UCI machine learning repository website. There are 303 instances in the data collection, and there are 14 attributes. The dataset is a collection of four alternative dataset combinations and it’s used to diagnose heart disease; however, contrary to some sources, only the UCI Cleveland dataset was utilized in this work (https://archive. Fig. 1 The model architecture for the predictive model https://archive.ics.uci.edu/ml/datasets/heart+Disease A Performance Study of Selected Machine Learning Techniques … 335 Table 1 Tabular representation of the methodology used Objective Methodology Output Acquire dataset for cardiovascular disease The cardiovascular disease dataset was acquired from the machine learning pool at UCI Use the dataset acquired to build a predictive model Develop a machine learning model to aid cardiovascular disease prediction The model would be developed using Python: Data preprocessing is done to check for missing values in the dataset Feature extraction is also done to reduce the initial dataset to manageable groups for easy processing The classification method for the model was done using the selected classification algorithms Identify the relevant features in the dataset, and develop a model using machine learning algorithms Evaluate the performance of the developed model and build the proposed system Evaluating the model based on its performance: accuracy, precision, recall, F1 score, and MCC Results of the performance of the model ics.uci.edu/ml/datasets/heart+Disease). The premise for the data is that there are 75 qualities or features, however, only 14 attributes have been processed in the published study [18, 19]. The UCI Cleveland dataset, which is part of the Dataset repository, contains the information, including the ‘Diagnosis’ attribute, which is used to predict outcomes or is known as the dependent variable’s name. The remaining attributes will be entered as input or are known as the independent variables. Table 2 lists the attribute descriptions. 2.3 Applying Various Machine Learning Algorithms Logistic Regression: This algorithm teaches grouping calculations that are used to predict how similar two objective variables will be. Ward variables’ central concept is dichotomous, meaning that it can only be either 1 or 0. A logistic regression model predicts P(Y= 1) as a part of X in terms of numbers. It is one of the least complicated ML algorithms that may be applied to a variety of grouping problems, such as spam recognition, diabetes prediction, illness localization, and so on. KNNAlgorithm: This method does a simple, controlled AI computation that may be used to address both the arrangement and relapse problems. Although it is simple to use and understand, it has the important drawback of gradually becoming less effective as the amount of data being used grows. It makes use of data from a few classes to predict how the new example point would be characterized. https://archive.ics.uci.edu/ml/datasets/heart+Disease 336 B. O. Olorunfemi et al. Table 2 Dataset attribute No. Features/attributes Type values Description features Value 1 Sex Discrete variable Male or female 1: Male 0: Female 2 Age continuous variable Patient’s range shown by age Numerous values between 28 and 77 (years of age) 3 CP (Chest Pain) Type Discrete variable Illustrate the following types of chest pain: classic angina, unusual angina, non-anginal pain, and asymptomatic 0: standard angina 1: unusual angina 2: Non-anginal discomfort 3: no symptoms 4 Rest Blood Pressure (Trestbps) Continuous variable Represents the resting heart rate (in mmHg on admission to the hospital) Multiples continuous worth in mmHg 5 Serum Cholesterol (Chol) Continuous variable Serum cholesterol in Mg/Dl Several Nonstop values in Mg/Dl 6 Fasting Blood Sugar (FBS) Discrete variable Represents the patient’s fasting blood sugar level 0: false (FBS > 120 Mg/Dl) 1: true 7 Max Heart Rate (Thalach) Continuous variable Represents the patient’s maximum heart rate Manifold values from 71 to 202 Low: under 50 Beats/min Normal: 51–119 Beats/min High: 120–180 Beat/min 8 Res Electrocardiographic (Restecg) Discrete variable Describe the results of the ECG, where each number reflects the intensity of discomfort 0: Normal 1: having ST-T Wave abnormality (T wave inversions and/ or ST elevation or depression of >0.05 mV) 2: showing probable or definite left ventricular hypertrophy by Estes’ criteria 9 Exercise Induced (Exang) Discrete variable Calculated exercise-induced angina 1: yes 0: no (continued) A Performance Study of Selected Machine Learning Techniques … 337 Table 2 (continued) No. Features/attributes Type values Description features Value 10 Oldpeak Discrete variable Show how exercise-induced ST depression compares to rest Multiple decimal number values between 0 and 6.2 11 Slope Discrete variable Outline the patient’s condition at the peak of activity. This section is divided into three components 0: upsloping 1: flat 2: downsloping 12 Major Vessels (Ca) Discrete variable The number of major vessels that fluoroscopy can illuminate. This section shows the total number of coloured vessels Number of major vessels (0–3) coloured by fluoroscopy → (0, 1, 2, 3) value 13 Thal Discrete variable Patients experiencing chest discomfort or respiratory difficulty should also have this variable checked. This section displays three distinct value types displaying Thallium test results 0 = normal 1 = fixed defect 2 = reversible defect and the label 14 Target/condition Discrete variable The dataset’s final column. This Target column is also known as the Class column or the Label column. This column provides prediction results with two classes, class 0 and class 1, based on the examination of the preceding 13 factors. If the class number is false, the chance of having heart disease is “0”. If the class shows the number “1,” the inverse is true, especially about the possibility of acquiring heart disease 0: no disease 1: Disease 338 B. O. Olorunfemi et al. SVM Backing Vector Machines (SVMs): are excellent but flexible directed AI computations that are used for both setup and relapse. But, they are used in charac- terization problems.When compared to other AI computations, SVMs have a unique technique of execution. naïve Bayes: Calculations carried out by this method mostly rely on Bayes’ Theorem. This is just one calculation, but other calculations are conducted, each of which follows a common norm, such as the independence of each pair of high- lights in the arrangement. A probabilistic AI computation called Credulous Bayes may be applied to a variety of order projects. Common applications include filtering spam, aggregating reports, idea prediction, and others. Random Forest: The technique of the classes (characterization) or mean/normal expectation, together with the construction of many choice trees during prepara- tion time, are the key components of the random forest group learning approach for arranging, relapsing, and various tasks. The simplicity and versatility of this computation make it one of the most common ones. XGBoost: It uses decision tree terminal nodes and computes shrinking leaf nodes as part of the Meta classifier in the machine learning environment. In that it reduces tree correlation, this strategy is comparable to gradient boosting. It has a greater range and is faster, though. Ada Boost: In 1996, Yoav Freund and Robert Schapire created this strategy. The main goal of this technique for detailed data is to obtain maximum accuracy or to transform weak learners into strong learners. 3 Results and Discussion The dataset used in this studywas acquired from theUCImachine learning repository containing a total of 1025 patients and 14 attributes. The dataset was split into two, 80% for training and 20% for testing. The following are the attributes of the dataset as shown in Table 3 (age, sex, chest pain, blood pain, ECG findings, maximal heart rate, exercise-induced angina, ST depressive disorders, ST slope, Principal vessels, Thalassemia kinds and goal). 3.1 Performance Evaluation Matrix A confusion matrix was used to evaluate the model. The number of accurate and inaccurate forecasts is totaled. Since we are working with binary classification, the matrix is 2 × 2. The evaluation’s usage of the confusion matrix is represented in Table 4. The two groups are 0 and 1, which indicate, respectively, negative (no heart disease) and positive (heart disease). While the non-diagonal numbers signify erroneous forecasts, the diagonal values reflect accurate predictions. A Performance Study of Selected Machine Learning Techniques … 339 Ta bl e 3 A Sa m pl e of th e da ta se t A ge Se x C P T re st bp s C ho l FB S R es te cg T ha la ch E xa ng O ld pe ak Sl op e C a T ha l Ta rg et 52 1 0 12 5 21 2 0 1 16 8 0 1. 00 00 00 2 2 3 0 53 1 0 14 0 20 3 1 0 15 5 1 3. 10 00 00 0 0 3 0 70 1 0 14 5 17 4 0 1 12 5 1 2. 60 00 00 0 0 3 0 61 1 0 14 8 20 3 0 1 16 1 0 0. 00 00 00 2 1 3 0 62 0 0 13 8 29 4 1 1 10 5 0 1. 90 00 00 1 3 2 0 340 B. O. Olorunfemi et al. Table 4 Confusion matrix Actual Predicted YES NO Total P + NYES TP FN NO FP TN Total P N 3.1.1 Terminologies of the Confusion Matrix are as Follows 1. True Positives [TP]: These are instanceswhere the classifier correctly categorized a positive case. 2. True Negatives [TN]: These negative instances were correctly identified by the classifier. 3. False Positives [FP]: These are instances where a negative case was mistakenly labelled as a positive one. 4. False Negatives (FN): These are instances where a positive outcome was mistakenly categorized as negative. The model’s performance was assessed using the metrics listed below. Accuracy: This is based on the confusion matrix; the rate of accuracy was computed using the formula below: Accuracy = TP + FP TP + TN + FP + FN Precision: These are referred to as positive predictive values. It is calculated by using this formula: Precision = TP TP + FP Recall: This is also referred to as sensitivity; it is calculated by using this formula: Recall = TP TP + FN A Performance Study of Selected Machine Learning Techniques … 341 60 80 100 120 Ac cu ra cy Selected ML algorithms Model Performance Comparison of the selected ML Logis�c Regression KNN Algorithm SVM Algorithm NAÏVE BAIVE XGBOOST ADABOOST Random Forest Fig. 2 Model performance comparison Table 5 Results of accuracy Algorithm Accuracy (%) Logistic regression 86.34 KNN algorithm 100.00 SVM algorithm 91.71 Naïve Baive 88.29 XGBoost 100.00 AdaBoost 92.20 Random forest 100.00 3.2 Discussion of the Model Performance Comparison The accuracy of the selected models is displayed in Fig. 2. The figure shows that RF, KNN and XGBoost outperformed other selected models. Most of the models work fine but among all the algorithms: KNN, XGBoost and Random Forest are the best with 100%. Thus, the accuracy rate for the following algorithms is explained below in Table 5. The confusion matrix for the following machine learning algorithms is given below in Fig. 4a–f. 342 B. O. Olorunfemi et al. (a) (b) (c) (d) (e) (f) Fig. 4 a–f Confusion matrices for the selected algorithms 4 Conclusion This task describes several machine learning grouping techniques for the cardio- vascular heart disease predictive model. It has been sorted after reviewing several example articles on using AI techniques. The accuracy of the suggested models varies depending on the equipment used, the dataset used, the number of attributes captured in the dataset, etc. Understanding of the dataset was increased via infor- mation research. On the dataset, several machine learning computations were built A Performance Study of Selected Machine Learning Techniques … 343 and tested. KNN, XGBoost, and random forest computations all had 100% accu- racy, which was the best out of all the calculations. The study assumes that a dataset with sufficient instances and accurate information should be used to create an exact cardiac illness. Building a forecast model should also involve using an acceptable computation. Finally, using AI to classify cardiac disease is an important topic that may benefit both patients and medical professionals. Although patient information is monstrously accessible at medical clinics or centres, very little of it is transmitted because of rigorous ethical issues one has to deal with. Acknowledgements This work was partially supported by the National Science and Technology Council, Taiwan, R.O.C., under contract no.: NSTC 110-2410-H-165-001-MY2. References 1. Smith FC (2014) Chest pain, syncope, and palpitations in the pediatric patient. In: Functional symptoms in pediatric disease: a clinical guide, pp 27–45 2. Cosaro E, Bonafini S, Montagnana M, Danese E, Trettene MS, Minuz P, Fava C et al (2014) Effects of magnesium supplements on blood pressure, endothelial function and metabolic parameters in healthy young men with a family history of metabolic syndrome. Nutr Metab Cardiovasc Dis 24(11):1213–1220 3. Münzel T, Miller MR, Sørensen M, Lelieveld J, Daiber A, Rajagopalan S (2020) Reduction of environmental pollutants for prevention of cardiovascular disease: it’s time to act. Eur Heart J 41(41):3989 4. Groenewegen A, Rutten FH, Mosterd A, Hoes AW (2020) Epidemiology of heart failure. Eur J Heart Fail 22(8):1342–1356 5. Biswas N, Ali MM, Rahaman MA, Islam M, Mia MR, Azam S, Moni MA et al (2023). Machine learning-based model to predict heart disease in early stage employing different feature selection techniques. BioMed Res Int 6. Folorunso SO, Awotunde JB, Adigun AA, Prasad LN, Lalitha VL (2023) A hybrid model for post-treatment mortality rate classification of patients with breast cancer. Healthc Anal 4:100254 7. Chandrasekhar N, Peddakrishna S (2023) Enhancing heart disease prediction accuracy through machine learning techniques and optimization. Processes 11(4):1210 8. Shukur BS, Mijwil MM (2023) Involving machine learning techniques in heart disease diagnosis: a performance analysis. Int J Electr Comput Eng 13(2):2177 9. Bakar WAWA, Josdi NLNB, Man MB, Zuhairi MAB (Mar 2023). A review: heart disease prediction in machine learning & deep learning. In: 2023 19th IEEE international colloquium on signal processing & its applications (CSPA). IEEE, pp 150–155 10. Awotunde JB, Ajagbe SA, Florez H (Oct 2023) A bio-inspired-based salp swarm algorithm enabled with deep learning for Alzheimer’s classification. In: International conference on applied informatics. Springer Nature Switzerland, Cham, pp 157–170 11. SrivenkateshM (2020) Prediction of cardiovascular disease usingmachine learning algorithms. Int J Eng Adv Technol (IJEAT) vol 9(3) 12. Shah D, Patel S, Bharti SK (2020) Heart disease prediction using machine learning techniques. SN Comput Sci 1:345. https://doi.org/10.1007/s42979-020-00365-y 13. Folorunso SO, Awotunde JB, Adigun AA, Panigrahi R, Garg A, Bhoi AK (2023) Multi-label learning model for diabetes disease comorbidity. J Inst Eng (India): Ser B, 1–13 14. Shouman M, Turner T, Stocker R (2012) Applying K-nearest neighbour in diagnosing heart disease patients. Int J Inf Educ Technol 2(3):220 https://doi.org/10.1007/s42979-020-00365-y 344 B. O. Olorunfemi et al. 15. Awotunde JB, Folorunsho O, Mustapha IO, Olusanya OO, Akanbi MB, Abiodun KM (2023) An enhanced internet of things enabled type-2 fuzzy logic for healthcare system applications. Recent trends on type-2 fuzzy logic systems: theory, methodology and applications. Springer International Publishing, Cham, pp 133–151 16. Pouriyeh SE (Jul 2017) A comprehensive investigation and comparison of machine learning techniques in the domain of heart disease. In: Proceedings of the IEEE symposiumon computers and communications (ISCC). Heraklion, Greece IEEE, pp 204–207 17. Medhekar D, Bote M, Deshmukh S (2013) Heart disease prediction system using naive Bayes algorithm. Int J Innov Sci Eng Technol 2(3):1–5 18. Rjeily CB, Badr G, Hajjam A, Andre’s E, Hajjarm A, Hassani E, Andres E (2019) Medical data mining for heart diseases and the future of sequential mining in the medical field, vol 149. Springer, pp 71–99. https://doi.org/10.1007/978-3-319-94030-44 19. Vembandasamy K, Sasipriya R, Deepa E (Jul 1988 [online]) Heart disease data set. http// www.archive.ics.uci.edu/ml/dataset/heart disease https://doi.org/10.1007/978-3-319-94030-44 A Performance Study of Selected Machine Learning Techniques for Predicting Heart Diseases 1 Introduction 2 Materials and Methods 2.1 Data Preprocessing 2.2 Model Architecture 2.3 Applying Various Machine Learning Algorithms 3 Results and Discussion 3.1 Performance Evaluation Matrix 3.2 Discussion of the Model Performance Comparison 4 Conclusion References