A comparison of multivariate statistical methods to detect risk factors for type 2 diabetes mellitus
Main Article Content
Aim: The goal of this study is to compare the performances of Logistic Regression (LR), Artificial Neural Networks (ANN) and Decision Tree models, which are machine learning classification methods, in the diagnosis of type 2 Diabetes Mellitus (DM) and to determine the most successful method. It is also the examination of risk factors affecting type 2 DM using these models.
Materials and Methods: The study's data was collected from patients who visited the Diabetes and Thyroid polyclinic at the Inonu University Faculty of Medicine Turgut Ozal Medical Center, Department of Internal Medicine. The k-Nearest Neighbor algorithm, which is one of the missing value assignment methods, was used to eliminate the problems related to missing values. Sensitivity, accuracy, precision, specificity, AUC F1-score, and classification error were used as performance evaluation criteria. Evolutionary algorithm parameter optimization method was used to optimize the parameters of the ANN model. Missing value assignment, modeling and parameter optimization were done with Rapidminer Studio Free version 8.1.
Results: Among the three methods applied in the diagnosis of type 2 DM, the ANN gave the best classification performance. The accuracy, sensitivity, selectivity, precision, F1-score, AUC and classification error values obtained from this method are respectively; 98.94%, 100%, 97.73%, 98.04%, 99.01%, 0.978 and 1.06. For the ANN method, the importance values of the gender, long-term drug use, family history, concomitant disease, cortisone use, stress factor, high blood pressure, smoking, high cholesterol, heart disease, exercise status, carbohydrate use, alcohol consumption, vegetable use, meat use, age, weight, height, starting age, daily bread consumption, LDL, HDL, Total Cholesterol, Triglyceride, Fasting blood sugar the importance values of independent variables are respectively; 0.017, 0.009, 0.013, 0.017, 0.008, 0.016, 0.008, 0.006, 0.053, 0.024, 0.023, 0.040, 0.007, 0.020, 0.007, 0.046, 0.083, 0.049, 0.024, 0.066, 0.084, 0.083, 0.020, 0.031, 0.244.
Conclusion: According to the performance criteria obtained from the three classification models used to predict type 2 DM; it has been found that the best classification performance belongs to the ANN model. According to the ANN method, the three most important risk factors that may cause type 2 DM were found to be fasting blood glucose, LDL, and HDL, respectively.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
CC Attribution-NonCommercial-NoDerivatives 4.0