This project is made with the help of a Heart Disease (Kaggle) Dataset for model training purpose. Initially, the data was analyzed by observing correlation between different features, individually and with the help of correlation matrix. Then, the unnecessary columns/features were discarded and only the most essential features were used in the model training process.
The models used were Logistic Regression, Decision Tree, SVM, Random Forest, Ensemble methods- Gradient Boosting Classifier and Stacking, out of which Logistic Regression and Gradient Boosting Classifiers gave the highest two performance according to the results obtained from their classification reports. But Gradient Boosting algorithm was considered for the implementation part, as it comes under the Ensemble methods and are more complex in nature than the Logistic Regresion model.
- numpy
- pandas
- matplotlib.pyplot and seaborn (for graph plotting)
- sklearn (for Mutual Information and all the ML models)
- tensorflow (for ANN model)
- genetic_selection (for Genetic Algorithm)