EazyML Explainable AI Template¶

Define Imports¶

In [ ]:
!pip install --upgrade eazyml-xai
!pip install --upgrade eazyml-automl
!pip install gdown python-dotenv
In [1]:
import os
from eazyml_xai import (
    ez_init,
    ez_explain
)

from eazyml import (
    ez_display_df,
    ez_build_model
)

import gdown
import pandas as pd

from dotenv import load_dotenv
load_dotenv()
Out[1]:
True

1. Initialize EazyML¶

The ez_init function uses the EAZYML_ACCESS_KEY environment variable for authentication. If the variable is not set, it defaults to a trial license.

In [2]:
ez_init(access_key=os.getenv('EAZYML_ACCESS_KEY'))
Out[2]:
{'success': True,
 'message': 'Initialized successfully. You may revoke your consent to sharing usage stats anytime. You have exclusive paid access.'}

2. Define Dataset Files and Outcome Variable¶

In [ ]:
gdown.download_folder(id='1DJtU6gI929GdEEZ3F_7w5LMnT90VvYI7')
In [3]:
# Names of the files that will be used by EazyML APIs
train_file_path = os.path.join('data', 'IRIS_Train.csv')
test_file_path  = os.path.join('data', 'IRIS_Test.csv')

# The column name for outcome of interest
outcome = 'species'

3. Dataset Information¶

The dataset used in this notebook is the Iris Dataset, which is a well-known dataset in machine learning and statistics. It contains data about 150 iris flowers, with four features (sepal length, sepal width, petal length, and petal width) and the species of the flower (setosa, versicolor, or virginica).

You can find more details and download the dataset from Kaggle using the following link:

Kaggle Iris Dataset

Columns in the Dataset:¶

  • sepal_length: Sepal length of the flower (cm)
  • sepal_width: Sepal width of the flower (cm)
  • petal_length: Petal length of the flower (cm)
  • petal_width: Petal width of the flower (cm)
  • species: Species of the iris flower (setosa, versicolor, virginica)

3.1 Display the Dataset¶

Below is a preview of the dataset:

In [4]:
# Load the dataset from the provided file
train = pd.read_csv(train_file_path)

# Display the first few rows of the dataset
train.head()
Out[4]:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa

4. EazyML Predictive Models¶

4.1 Reading the Datasets and Dropping Unnecessary Columns¶

In [5]:
discard_columns = []

# Reading Training Data
train = pd.read_csv(train_file_path)
train = train.drop(columns=discard_columns)

# Reading Test Data
test = pd.read_csv(test_file_path)
test = test.drop(columns=discard_columns)

4.2 Model Training: Several Models Trained¶

In [6]:
## Build Model
options = {'model_type': 'predictive'}
resp = ez_build_model(train, outcome=outcome, options=options)

4.3 Show Model Performance¶

In [7]:
ez_display_df(resp['model_performance'])
  Model Kappa Accuracy
0 Bagged Decision Trees with Information Gain 1.00 1.00
1 Boosted Decision Trees with InformationGain 1.00 1.00
2 Random Forest with Information Gain 1.00 1.00
3 Naive Bayes 0.93 0.95
4 Logistic Regression 0.92 0.95
5 Gradient Boosting Classifier 0.91 0.94

5. Get Explanations¶

5.1 Use model_info from ez_build_model¶

In [8]:
# In extra info, we have model information
model_info = resp["model_info"]

5.2 Get Explanations for Top 2 Points¶

In [9]:
options = {'record_number': [1, 2]}
response = ez_explain(train, outcome, test_file_path, model_info, options=options)

5.3 Display Explanation DataFrame¶

In [ ]:
ex_df = pd.DataFrame([i.values() for i in response['explanations']], columns=response['explanations'][0].keys())
ez_display_df(ex_df)
In [ ]: