EazyML Explainable AI Template¶

Define Imports¶

In [ ]:
!pip install --upgrade eazyml-xai
!pip install --upgrade eazyml-automl
!pip install gdown python-dotenv
In [1]:
import os
from eazyml_xai import (
    ez_init,
    ez_explain
)

from eazyml import (
    ez_display_df,
    ez_build_model
)

import gdown
import pandas as pd

from dotenv import load_dotenv
load_dotenv()
Out[1]:
True

1. Initialize EazyML¶

The ez_init function uses the EAZYML_ACCESS_KEY environment variable for authentication. If the variable is not set, it defaults to a trial license.

In [2]:
ez_init(access_key=os.getenv('EAZYML_ACCESS_KEY'))
Out[2]:
{'success': True,
 'message': 'Initialized successfully. You may revoke your consent to sharing usage stats anytime. You have exclusive paid access.'}

2. Define Dataset Files and Outcome Variable¶

In [ ]:
gdown.download_folder(id='1EobxYR3pg_Z3Sd4sETfe4aJLAsT98fL2')
In [3]:
# Names of the files that will be used by EazyML APIs
train_file_path = os.path.join('data', 'Heart_Attack_traindata.csv')
test_file_path  = os.path.join('data', 'Heart_Attack_testdata.csv')

# The column name for outcome of interest
outcome = 'class'

3. Dataset Information¶

The dataset used in this notebook is the Heart Attack Dataset, which is a well-known dataset in machine learning and statistics. It contains data about patients, with several features (such as age, gender, blood pressure levels, and heart-related measurements) to predict the likelihood of a heart attack.

Columns in the Dataset:¶

  • age: The age of the patient, measured in years.
  • gender: The gender of the patient, represented as a categorical variable (e.g., 1 = male, 0 = female).
  • impulse: Refers to the patient's pulse rate, measured in beats per minute (bpm).
  • pressurehight: Refers to systolic blood pressure, the higher number in a blood pressure reading (e.g., 120/80 mmHg).
  • pressurelow: Refers to diastolic blood pressure, the lower number in a blood pressure reading (e.g., 120/80 mmHg).
  • glucose: A measurement related to the heart, likely referring to potassium (K) concentration.
  • kcm: This refer to a measurement related to the heart, related to potassium (K) concentration.
  • troponin: A protein found in the heart muscle, measured to assess heart damage (especially after a heart attack).
  • class: The target variable, indicating the presence or absence of a condition or disease (e.g., 1 = heart attack, 0 = no heart attack).

3.1 Display the Dataset¶

Below is a preview of the dataset:

In [4]:
# Load the dataset from the provided file
train = pd.read_csv(train_file_path)

# Display the first few rows of the dataset
ez_display_df(train.head())
  age gender impluse pressurehight pressurelow glucose kcm troponin class
0 64 1 66 160 83 160.000000 1.800000 0.012000 negative
1 21 1 94 98 46 296.000000 6.750000 1.060000 positive
2 55 1 64 160 77 270.000000 1.990000 0.003000 negative
3 64 1 70 120 55 270.000000 13.870000 0.122000 positive
4 55 1 64 112 65 300.000000 1.080000 0.003000 negative

4. EazyML Predictive Models¶

4.1 Reading the Datasets and Dropping Unnecessary Columns¶

In [5]:
discard_columns = []

# Reading Training Data
train = pd.read_csv(train_file_path)
train = train.drop(columns=discard_columns)

# Reading Test Data
test = pd.read_csv(test_file_path)
test = test.drop(columns=discard_columns)

4.2 Model Training: Several Models Trained¶

In [6]:
## Build Model
options = {'model_type': 'predictive'}
resp = ez_build_model(train, outcome=outcome, options=options)

4.3 Show Model Performance¶

In [7]:
ez_display_df(resp['model_performance'])
  Model Kappa Accuracy
0 Bagged Decision Trees with Information Gain 0.98 0.99
1 Random Forest with Information Gain 0.98 0.99
2 Gradient Boosting Classifier 0.98 0.99
3 Boosted Decision Trees with InformationGain 0.96 0.98
4 Logistic Regression 0.59 0.80
5 Naive Bayes 0.44 0.70

5. Get Explanations¶

5.1 Use model_info from ez_build_model¶

In [8]:
# In extra info, we have model information
model_info = resp["model_info"]

5.2 Get Explanations for 5 Points¶

In [9]:
options = {'record_number': [1, 6, 7, 8, 9]}
response = ez_explain(train_file_path, outcome, test_file_path, model_info, options=options)

5.3 Display Explanation DataFrame¶

In [10]:
ex_df = pd.DataFrame([i.values() for i in response['explanations']], columns=response['explanations'][0].keys())
ez_display_df(ex_df)
  record_numbers prediction explanation explainability_score local_importance
0 1 negative troponin is 0.01 (that is in range -0.32, 0.01) , kcm is 1.8 (that is less than or equal to 4.95) 97% {'troponin': 0.65, 'kcm': 0.18, 'pressurehight': 0.09, 'gender': 0.08}
1 6 negative troponin is 0.0 (that is in range -0.31, 0.01) , kcm is 1.83 (that is less than or equal to 4.95) , gender is 1 96% {'troponin': 0.67, 'gender': 0.18, 'kcm': 0.16}
2 7 negative pressurehight is 179.0 (that is more than 157.5) , troponin is 0.0 (that is more than -0.3), gender is not 1 31% {'pressurehight': 0.38, 'troponin': 0.36, 'kcm': 0.16, 'gender': 0.1}
3 8 positive kcm is 300.0 (that is more than 256.83) , pressurehight is 214.0 (that is more than 183.01) , troponin is 2.37 (that is less than or equal to 2.37) 25% {'kcm': 0.7, 'troponin': 0.26}
4 9 negative troponin is 0.0 (that is in range -0.3, 0.01) , kcm is 2.35 (that is in range -2.14, 6.29) 97% {'troponin': 0.64, 'kcm': 0.19, 'gender': 0.13}
In [ ]: