Machine learning-based forecasting of coronary artery disease risk factors and diagnostic insights
Main Article Content
Abstract
Aim: As people's quality of life and habits have changed, Coronary Artery Disease (CAD) has become the leading cause of death globally. It is a complicated cardiac disease with various risk factors and a wide range of symptoms. An early and accurate diagnosis of CAD allows for the quick administration of appropriate treatment, which contributes to a decreased mortality rate. Machine learning (ML) algorithms for CAD prediction and treatment decisions are quickly being developed and implemented in clinical practice. Predictive models based on machine learning algorithms may aid health personnel in the early diagnosis of CAD, lowering mortality. Thus, this study goal is to forecast the elements that may be connected with CAD using tree-based approaches, which are one of the machine learning methods, and to discover which factor is more effective on CAD.
Materials and Methods: The open-access heart disease dataset was used within the scope of the study to investigate the risk factors related with CAD. The data set used contains the values of 333 patients, as well as 20 input and 1 target variables. The 10-fold cross validation approach was employed in the modeling, and the data set was divided as 80%: 20% as training and test datasets. For model assessment, the measures of accuracy (ACC), balanced accuracy (b-ACC), sensitivity (SE), specificity (SP), positive predictive value (ppv), negative predictive value (npv), and F1-score were utilized.
Results: The values of ACC, b-ACC, SE, SP, ppv, npv, and F1-score performance metrics were 9 98.5%, 98.8%, 97.7%, 100%, 100%, 95.8% and 98.8%, respectively, as a consequence of the estimate model results created with the XGBoost approach, which has the best performance among tree-based models. When the groups with or without CAD were compared, a statistically significant difference was found in terms of the age variable. There is also a significant relationship between the active, lifestyle, ihd, dm, ecgpatt, qwave variables and the presence/absence of the CAD variable. When the variable significance values obtained as a result of modeling with the highest performing XGBoost are examined, it is seen that the variables that most associated with CAD are ekgpatt: normal, ekgpatt: ST-depression, ekgpatt: T-inversion, qwave: yes, age, bpdias, height, LDL, HR, IVSD: with LVH, bpsyDM.
Conclusion: According to the performance criteria of the forecasting models used, CAD gave distinctively successful results in forecasting. By identifying risk factors associated with CAD, the proposed machine learning models can provide clinicians with practical, cost-effective and beneficial assistance in making accurate predictive decisions.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
CC Attribution-NonCommercial-NoDerivatives 4.0