EazyML Modeling: Heart Attack Classification¶

Define Imports¶

In [ ]:
!pip install --upgrade eazyml-automl
!pip install gdown python-dotenv
In [1]:
import os
import pandas as pd
import eazyml as ez
import gdown
from dotenv import load_dotenv
load_dotenv()
Out[1]:
True

1. Initialize EazyML¶

The ez_init function uses the EAZYML_ACCESS_KEY environment variable for authentication. If the variable is not set, it defaults to a trial license.

In [2]:
ez.ez_init(os.getenv('EAZYML_ACCESS_KEY'))
Out[2]:
{'success': True,
 'message': 'Initialized successfully. You may revoke your consent to sharing usage stats anytime. You have exclusive paid access.'}

2. Define Dataset Files, Outcome Variable and Train model¶

2.1 Define Train Dataset and Other Model Parameters¶

In [ ]:
gdown.download_folder(id='1EobxYR3pg_Z3Sd4sETfe4aJLAsT98fL2')
In [3]:
#classificaton
file_path = os.path.join('data', "Heart_Attack_traindata.csv")
outcome = "class"

# read dataframe and modify outcome column in numerical value
df = pd.read_csv(file_path)

# define options
options = {'model_type': 'predictive'}

2.2 Train Model¶

In [4]:
resp = ez.ez_build_model(df, outcome=outcome, options=options)

2.3 Show model performance¶

In [5]:
ez.ez_display_df(resp['model_performance'])
  Model Kappa Accuracy
0 Bagged Decision Trees with Information Gain 0.98 0.99
1 Random Forest with Information Gain 0.98 0.99
2 Gradient Boosting Classifier 0.98 0.99
3 Boosted Decision Trees with InformationGain 0.96 0.98
4 Logistic Regression 0.59 0.80
5 Naive Bayes 0.44 0.70

3. Dataset Information¶

The dataset used in this notebook is the Heart Attack Dataset, which is a well-known dataset in machine learning and statistics. It contains data about patients, with several features (such as age, gender, blood pressure levels, and heart-related measurements) to predict the likelihood of a heart attack.

Columns in the Dataset:¶

  • age: The age of the patient, measured in years.
  • gender: The gender of the patient, represented as a categorical variable (e.g., 1 = male, 0 = female).
  • impulse: Refers to the patient's pulse rate, measured in beats per minute (bpm).
  • pressurehight: Refers to systolic blood pressure, the higher number in a blood pressure reading (e.g., 120/80 mmHg).
  • pressurelow: Refers to diastolic blood pressure, the lower number in a blood pressure reading (e.g., 120/80 mmHg).
  • glucose: A measurement related to the heart, likely referring to potassium (K) concentration.
  • kcm: This refer to a measurement related to the heart, related to potassium (K) concentration.
  • troponin: A protein found in the heart muscle, measured to assess heart damage (especially after a heart attack).
  • class: The target variable, indicating the presence or absence of a condition or disease (e.g., 1 = heart attack, 0 = no heart attack).

3.1 Display the Dataset¶

Below is a preview of the dataset:

In [6]:
# Load the dataset from the provided file
train = pd.read_csv(file_path)

# Display the first few rows of the dataset
ez.ez_display_df(train.head())
  age gender impluse pressurehight pressurelow glucose kcm troponin class
0 64 1 66 160 83 160.000000 1.800000 0.012000 negative
1 21 1 94 98 46 296.000000 6.750000 1.060000 positive
2 55 1 64 160 77 270.000000 1.990000 0.003000 negative
3 64 1 70 120 55 270.000000 13.870000 0.122000 positive
4 55 1 64 112 65 300.000000 1.080000 0.003000 negative

4. Define Test Dataset and Predict on that Dataset¶

In [7]:
# In extra info, we have model information
model_info = resp["model_info"]

4.1 Define Test Dataset¶

In [8]:
test_file_path = os.path.join('data', "Heart_Attack_testdata.csv")
test_data = pd.read_csv(test_file_path)

4.2 Predict on Test Dataset¶

In [9]:
options = {}
pred_resp = ez.ez_predict(test_data, model_info=model_info, options=options)
pred_df = pred_resp['pred_df']
In [10]:
ez.ez_display_df(pred_df.head())
  age gender impluse pressurehight pressurelow glucose kcm troponin class Predicted class
0 64 1 66 160 83 160 1.800000 0.012000 NEGATIVE NEGATIVE
1 21 1 94 98 46 296 6.750000 1.060000 POSITIVE POSITIVE
2 55 1 64 160 77 270 1.990000 0.003000 NEGATIVE NEGATIVE
3 64 1 70 120 55 270 13.870000 0.122000 POSITIVE POSITIVE
4 55 1 64 112 65 300 1.080000 0.003000 NEGATIVE NEGATIVE
In [ ]: