EazyML Explainable AI Template¶
Define Imports¶
In [ ]:
!pip install --upgrade eazyml-xai
!pip install --upgrade eazyml-automl
!pip install gdown python-dotenv
In [1]:
import os
from eazyml_xai import (
ez_init,
ez_explain
)
from eazyml import (
ez_display_df,
ez_build_model
)
import gdown
import pandas as pd
from dotenv import load_dotenv
load_dotenv()
Out[1]:
True
1. Initialize EazyML¶
The ez_init function uses the EAZYML_ACCESS_KEY environment variable for authentication. If the variable is not set, it defaults to a trial license.
In [2]:
ez_init(access_key=os.getenv('EAZYML_ACCESS_KEY'))
Out[2]:
{'success': True,
'message': 'Initialized successfully. You may revoke your consent to sharing usage stats anytime. You have exclusive paid access.'}
2. Define Dataset Files and Outcome Variable¶
In [ ]:
gdown.download_folder(id='1EobxYR3pg_Z3Sd4sETfe4aJLAsT98fL2')
In [3]:
# Names of the files that will be used by EazyML APIs
train_file_path = os.path.join('data', 'Heart_Attack_traindata.csv')
test_file_path = os.path.join('data', 'Heart_Attack_testdata.csv')
# The column name for outcome of interest
outcome = 'class'
3. Dataset Information¶
The dataset used in this notebook is the Heart Attack Dataset, which is a well-known dataset in machine learning and statistics. It contains data about patients, with several features (such as age, gender, blood pressure levels, and heart-related measurements) to predict the likelihood of a heart attack.
Columns in the Dataset:¶
- age: The age of the patient, measured in years.
- gender: The gender of the patient, represented as a categorical variable (e.g., 1 = male, 0 = female).
- impulse: Refers to the patient's pulse rate, measured in beats per minute (bpm).
- pressurehight: Refers to systolic blood pressure, the higher number in a blood pressure reading (e.g., 120/80 mmHg).
- pressurelow: Refers to diastolic blood pressure, the lower number in a blood pressure reading (e.g., 120/80 mmHg).
- glucose: A measurement related to the heart, likely referring to potassium (K) concentration.
- kcm: This refer to a measurement related to the heart, related to potassium (K) concentration.
- troponin: A protein found in the heart muscle, measured to assess heart damage (especially after a heart attack).
- class: The target variable, indicating the presence or absence of a condition or disease (e.g., 1 = heart attack, 0 = no heart attack).
3.1 Display the Dataset¶
Below is a preview of the dataset:
In [4]:
# Load the dataset from the provided file
train = pd.read_csv(train_file_path)
# Display the first few rows of the dataset
ez_display_df(train.head())
| age | gender | impluse | pressurehight | pressurelow | glucose | kcm | troponin | class | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 64 | 1 | 66 | 160 | 83 | 160.000000 | 1.800000 | 0.012000 | negative |
| 1 | 21 | 1 | 94 | 98 | 46 | 296.000000 | 6.750000 | 1.060000 | positive |
| 2 | 55 | 1 | 64 | 160 | 77 | 270.000000 | 1.990000 | 0.003000 | negative |
| 3 | 64 | 1 | 70 | 120 | 55 | 270.000000 | 13.870000 | 0.122000 | positive |
| 4 | 55 | 1 | 64 | 112 | 65 | 300.000000 | 1.080000 | 0.003000 | negative |
4. EazyML Predictive Models¶
4.1 Reading the Datasets and Dropping Unnecessary Columns¶
In [5]:
discard_columns = []
# Reading Training Data
train = pd.read_csv(train_file_path)
train = train.drop(columns=discard_columns)
# Reading Test Data
test = pd.read_csv(test_file_path)
test = test.drop(columns=discard_columns)
4.2 Model Training: Several Models Trained¶
In [6]:
## Build Model
options = {'model_type': 'predictive'}
resp = ez_build_model(train, outcome=outcome, options=options)
4.3 Show Model Performance¶
In [7]:
ez_display_df(resp['model_performance'])
| Model | Kappa | Accuracy | |
|---|---|---|---|
| 0 | Bagged Decision Trees with Information Gain | 0.98 | 0.99 |
| 1 | Random Forest with Information Gain | 0.98 | 0.99 |
| 2 | Gradient Boosting Classifier | 0.98 | 0.99 |
| 3 | Boosted Decision Trees with InformationGain | 0.96 | 0.98 |
| 4 | Logistic Regression | 0.59 | 0.80 |
| 5 | Naive Bayes | 0.44 | 0.70 |
5. Get Explanations¶
5.1 Use model_info from ez_build_model¶
In [8]:
# In extra info, we have model information
model_info = resp["model_info"]
5.2 Get Explanations for 5 Points¶
In [9]:
options = {'record_number': [1, 6, 7, 8, 9]}
response = ez_explain(train_file_path, outcome, test_file_path, model_info, options=options)
5.3 Display Explanation DataFrame¶
In [10]:
ex_df = pd.DataFrame([i.values() for i in response['explanations']], columns=response['explanations'][0].keys())
ez_display_df(ex_df)
| record_numbers | prediction | explanation | explainability_score | local_importance | |
|---|---|---|---|---|---|
| 0 | 1 | negative | troponin is 0.01 (that is in range -0.32, 0.01) , kcm is 1.8 (that is less than or equal to 4.95) | 97% | {'troponin': 0.65, 'kcm': 0.18, 'pressurehight': 0.09, 'gender': 0.08} |
| 1 | 6 | negative | troponin is 0.0 (that is in range -0.31, 0.01) , kcm is 1.83 (that is less than or equal to 4.95) , gender is 1 | 96% | {'troponin': 0.67, 'gender': 0.18, 'kcm': 0.16} |
| 2 | 7 | negative | pressurehight is 179.0 (that is more than 157.5) , troponin is 0.0 (that is more than -0.3), gender is not 1 | 31% | {'pressurehight': 0.38, 'troponin': 0.36, 'kcm': 0.16, 'gender': 0.1} |
| 3 | 8 | positive | kcm is 300.0 (that is more than 256.83) , pressurehight is 214.0 (that is more than 183.01) , troponin is 2.37 (that is less than or equal to 2.37) | 25% | {'kcm': 0.7, 'troponin': 0.26} |
| 4 | 9 | negative | troponin is 0.0 (that is in range -0.3, 0.01) , kcm is 2.35 (that is in range -2.14, 6.29) | 97% | {'troponin': 0.64, 'kcm': 0.19, 'gender': 0.13} |
In [ ]: