EazyML Modeling: Heart Attack Classification¶
Define Imports¶
In [ ]:
!pip install --upgrade eazyml-automl
!pip install gdown python-dotenv
In [1]:
import os
import pandas as pd
import eazyml as ez
import gdown
from dotenv import load_dotenv
load_dotenv()
Out[1]:
True
1. Initialize EazyML¶
The ez_init function uses the EAZYML_ACCESS_KEY environment variable for authentication. If the variable is not set, it defaults to a trial license.
In [2]:
ez.ez_init(os.getenv('EAZYML_ACCESS_KEY'))
Out[2]:
{'success': True,
'message': 'Initialized successfully. You may revoke your consent to sharing usage stats anytime. You have exclusive paid access.'}
2. Define Dataset Files, Outcome Variable and Train model¶
2.1 Define Train Dataset and Other Model Parameters¶
In [ ]:
gdown.download_folder(id='1EobxYR3pg_Z3Sd4sETfe4aJLAsT98fL2')
In [3]:
#classificaton
file_path = os.path.join('data', "Heart_Attack_traindata.csv")
outcome = "class"
# read dataframe and modify outcome column in numerical value
df = pd.read_csv(file_path)
# define options
options = {'model_type': 'predictive'}
2.2 Train Model¶
In [4]:
resp = ez.ez_build_model(df, outcome=outcome, options=options)
2.3 Show model performance¶
In [5]:
ez.ez_display_df(resp['model_performance'])
| Model | Kappa | Accuracy | |
|---|---|---|---|
| 0 | Bagged Decision Trees with Information Gain | 0.98 | 0.99 |
| 1 | Random Forest with Information Gain | 0.98 | 0.99 |
| 2 | Gradient Boosting Classifier | 0.98 | 0.99 |
| 3 | Boosted Decision Trees with InformationGain | 0.96 | 0.98 |
| 4 | Logistic Regression | 0.59 | 0.80 |
| 5 | Naive Bayes | 0.44 | 0.70 |
3. Dataset Information¶
The dataset used in this notebook is the Heart Attack Dataset, which is a well-known dataset in machine learning and statistics. It contains data about patients, with several features (such as age, gender, blood pressure levels, and heart-related measurements) to predict the likelihood of a heart attack.
Columns in the Dataset:¶
- age: The age of the patient, measured in years.
- gender: The gender of the patient, represented as a categorical variable (e.g., 1 = male, 0 = female).
- impulse: Refers to the patient's pulse rate, measured in beats per minute (bpm).
- pressurehight: Refers to systolic blood pressure, the higher number in a blood pressure reading (e.g., 120/80 mmHg).
- pressurelow: Refers to diastolic blood pressure, the lower number in a blood pressure reading (e.g., 120/80 mmHg).
- glucose: A measurement related to the heart, likely referring to potassium (K) concentration.
- kcm: This refer to a measurement related to the heart, related to potassium (K) concentration.
- troponin: A protein found in the heart muscle, measured to assess heart damage (especially after a heart attack).
- class: The target variable, indicating the presence or absence of a condition or disease (e.g., 1 = heart attack, 0 = no heart attack).
3.1 Display the Dataset¶
Below is a preview of the dataset:
In [6]:
# Load the dataset from the provided file
train = pd.read_csv(file_path)
# Display the first few rows of the dataset
ez.ez_display_df(train.head())
| age | gender | impluse | pressurehight | pressurelow | glucose | kcm | troponin | class | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 64 | 1 | 66 | 160 | 83 | 160.000000 | 1.800000 | 0.012000 | negative |
| 1 | 21 | 1 | 94 | 98 | 46 | 296.000000 | 6.750000 | 1.060000 | positive |
| 2 | 55 | 1 | 64 | 160 | 77 | 270.000000 | 1.990000 | 0.003000 | negative |
| 3 | 64 | 1 | 70 | 120 | 55 | 270.000000 | 13.870000 | 0.122000 | positive |
| 4 | 55 | 1 | 64 | 112 | 65 | 300.000000 | 1.080000 | 0.003000 | negative |
4. Define Test Dataset and Predict on that Dataset¶
In [7]:
# In extra info, we have model information
model_info = resp["model_info"]
4.1 Define Test Dataset¶
In [8]:
test_file_path = os.path.join('data', "Heart_Attack_testdata.csv")
test_data = pd.read_csv(test_file_path)
4.2 Predict on Test Dataset¶
In [9]:
options = {}
pred_resp = ez.ez_predict(test_data, model_info=model_info, options=options)
pred_df = pred_resp['pred_df']
In [10]:
ez.ez_display_df(pred_df.head())
| age | gender | impluse | pressurehight | pressurelow | glucose | kcm | troponin | class | Predicted class | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 64 | 1 | 66 | 160 | 83 | 160 | 1.800000 | 0.012000 | NEGATIVE | NEGATIVE |
| 1 | 21 | 1 | 94 | 98 | 46 | 296 | 6.750000 | 1.060000 | POSITIVE | POSITIVE |
| 2 | 55 | 1 | 64 | 160 | 77 | 270 | 1.990000 | 0.003000 | NEGATIVE | NEGATIVE |
| 3 | 64 | 1 | 70 | 120 | 55 | 270 | 13.870000 | 0.122000 | POSITIVE | POSITIVE |
| 4 | 55 | 1 | 64 | 112 | 65 | 300 | 1.080000 | 0.003000 | NEGATIVE | NEGATIVE |
In [ ]: