EazyML Insights Template¶

Define Imports¶

In [ ]:
!pip install --upgrade eazyml-insight
!pip install --upgrade eazyml-automl
!pip install gdown python-dotenv
In [1]:
import os
from eazyml_insight import (
    ez_insight,
    ez_init,
    ez_validate
)

from eazyml import ez_display_df
import gdown
import pandas as pd

from dotenv import load_dotenv
load_dotenv()
Out[1]:
True

1. Initialize EazyML¶

The ez_init function uses the EAZYML_ACCESS_KEY environment variable for authentication. If the variable is not set, it defaults to a trial license.

In [2]:
ez_init(access_key=os.getenv('EAZYML_ACCESS_KEY'))
Out[2]:
{'success': True,
 'message': 'Initialized successfully. You may revoke your consent to sharing usage stats anytime. You have exclusive paid access.'}

2. Define Dataset Files and Outcome Variable¶

In [ ]:
gdown.download_folder(id='1-RO9K9-YYGK7Wp__ioth0xPD8XqtgvKT')
In [3]:
# Names of the files that will be used by EazyML APIs
train_file_path = os.path.join('data', 'IRIS_Train.csv')
test_file_path = os.path.join('data', 'IRIS_Test.csv')

# The column name for outcome of interest
outcome = "species"

3. Dataset Information¶

The dataset used in this notebook is the Iris Dataset, which is a well-known dataset in machine learning and statistics. It contains data about 150 iris flowers, with four features (sepal length, sepal width, petal length, and petal width) and the species of the flower (setosa, versicolor, or virginica).

You can find more details and download the dataset from Kaggle using the following link:

Kaggle Iris Dataset

Columns in the Dataset:¶

  • sepal_length: Sepal length of the flower (cm)
  • sepal_width: Sepal width of the flower (cm)
  • petal_length: Petal length of the flower (cm)
  • petal_width: Petal width of the flower (cm)
  • species: Species of the iris flower (setosa, versicolor, virginica)

3.1 Display the Dataset¶

Below is a preview of the dataset:

In [4]:
# Load the dataset from the provided file
train = pd.read_csv(train_file_path)

# Display the first few rows of the dataset
ez_display_df(train.head())
  sepal_length sepal_width petal_length petal_width species
0 5.100000 3.500000 1.400000 0.200000 Iris-setosa
1 4.900000 3.000000 1.400000 0.200000 Iris-setosa
2 4.700000 3.200000 1.300000 0.200000 Iris-setosa
3 4.600000 3.100000 1.500000 0.200000 Iris-setosa
4 5.000000 3.600000 1.400000 0.200000 Iris-setosa

4. EazyML Insights¶

4.1 Auto-derive Insights¶

4.1.1 Build Insight Model¶

In [5]:
response = ez_insight(train_file_path, outcome, options={})

4.1.2 Convert Response to DataFrame¶

In [6]:
insights_df = pd.DataFrame(response['insights']['data'], columns=response['insights']['columns'])

4.1.3 Display Augmented Insights¶

4.1.3.1 For Class Iris-virginica¶
In [7]:
insights_df1 = insights_df[insights_df[outcome] == 'Iris-virginica']
ez_display_df(insights_df1.head())
  species Augmented Intelligence Insights Insight Scores
0 Iris-virginica sepal_length is greater than 5.55, petal_width is greater than 1.75 0.836000
1 Iris-virginica petal_width is greater than 0.8, petal_length is greater than 4.75 0.833900
2 Iris-virginica petal_width is greater than 1.75 0.802800
3 Iris-virginica sepal_length is greater than 6.25, sepal_width is less than equal to 3.7, petal_length is greater than 5.05 0.752600
4 Iris-virginica sepal_length is greater than 6.25, sepal_width is less than equal to 3.7 0.583500
4.1.3.2 For Class Iris-versicolor¶
In [8]:
insights_df0 = insights_df[insights_df[outcome] == 'Iris-versicolor']
ez_display_df(insights_df0.head())
  species Augmented Intelligence Insights Insight Scores
20 Iris-versicolor petal_width is greater than 0.8, petal_length is less than equal to 4.75 0.862100
21 Iris-versicolor petal_width in ( 0.8, 1.75 ) 0.843200
22 Iris-versicolor petal_width in ( 0.8, 1.75 ), petal_length is less than equal to 4.95, sepal_width is greater than 2.55 0.707500
23 Iris-versicolor sepal_length is greater than 5.55, petal_width in ( 0.7, 1.75 ), petal_length is less than equal to 4.95 0.707500
24 Iris-versicolor sepal_length in ( 5.55, 6.25 ), sepal_width in ( 2.65, 3.7 ), petal_width is less than equal to 1.7 0.698400

4.2 Validation of Insights¶

4.2.1 Validating Insights¶

In [9]:
record_number = [3, 5]
options = {'record_number': record_number}
val_response = ez_validate(train_file_path, outcome, response['insights'], train_file_path, options=options)

4.2.2 Convert Response to DataFrame¶

In [10]:
validate_df = pd.DataFrame(val_response['validations']['data'], columns=val_response['validations']['columns'])

4.2.3 Display Validation Metrics¶

4.2.3.1 For Class Iris-virginica¶
In [11]:
validate_df1 = validate_df[validate_df[outcome] == 'Iris-virginica']
ez_display_df(validate_df1.head())
  Test Data Point Number species Augmented Intelligence Insights Insight Scores Accuracy Coverage Population Accuracy Count Total Population
28 29 Iris-virginica sepal_length is greater than 5.55, petal_width is greater than 1.75 0.836000 0.978300 0.306700 46 45 150
29 30 Iris-virginica petal_width is greater than 0.8, petal_length is greater than 4.75 0.833900 0.890900 0.366700 55 49 150
30 31 Iris-virginica petal_width is greater than 1.75 0.802800 0.978300 0.306700 46 45 150
31 32 Iris-virginica sepal_length is greater than 6.25, sepal_width is less than equal to 3.7, petal_length is greater than 5.05 0.752600 1.000000 0.220000 33 33 150
32 33 Iris-virginica sepal_length is greater than 6.25, sepal_width is less than equal to 3.7 0.583500 0.714300 0.326700 49 35 150
4.2.3.2 For Class Iris-versicolor¶
In [12]:
validate_df0 = validate_df[validate_df[outcome] == 'Iris-versicolor']
ez_display_df(validate_df0.head())
  Test Data Point Number species Augmented Intelligence Insights Insight Scores Accuracy Coverage Population Accuracy Count Total Population
7 8 Iris-versicolor petal_width is greater than 0.8, petal_length is less than equal to 4.75 0.862100 0.977800 0.300000 45 44 150
8 9 Iris-versicolor petal_width in ( 0.8, 1.75 ) 0.843200 0.907400 0.360000 54 49 150
9 10 Iris-versicolor petal_width in ( 0.8, 1.75 ), petal_length is less than equal to 4.95, sepal_width is greater than 2.55 0.707500 1.000000 0.226700 34 34 150
10 11 Iris-versicolor sepal_length is greater than 5.55, petal_width in ( 0.7, 1.75 ), petal_length is less than equal to 4.95 0.707500 1.000000 0.240000 36 36 150
11 12 Iris-versicolor sepal_length in ( 5.55, 6.25 ), sepal_width in ( 2.65, 3.7 ), petal_width is less than equal to 1.7 0.698400 1.000000 0.126700 19 19 150

4.2.4 Display Filtered Data for Specific Record Numbers¶

In [13]:
for i in range(len(record_number)):
    print (val_response['validation_filter'][i]['Augmented Intelligence Insights'])
    filter_df = pd.DataFrame(val_response['validation_filter'][i]['filtered_data']['data'], columns=val_response[
                             'validation_filter'][i]['filtered_data']['columns']) 
    ez_display_df(filter_df.head())
    print ('\n')
sepal_length is less than equal to 5.55
  sepal_length sepal_width petal_length petal_width species
0 5.100000 3.500000 1.400000 0.200000 IRIS-SETOSA
1 4.900000 3.000000 1.400000 0.200000 IRIS-SETOSA
2 4.700000 3.200000 1.300000 0.200000 IRIS-SETOSA
3 4.600000 3.100000 1.500000 0.200000 IRIS-SETOSA
4 5.000000 3.600000 1.400000 0.200000 IRIS-SETOSA

sepal_length in ( 5.55, 6.75 ),
sepal_width is greater than 3.7
  sepal_length sepal_width petal_length petal_width species
0 5.800000 4.000000 1.200000 0.200000 IRIS-SETOSA
1 5.700000 4.400000 1.500000 0.400000 IRIS-SETOSA
2 5.700000 3.800000 1.700000 0.300000 IRIS-SETOSA

In [ ]: