EazyML Insights Template¶

Define Imports¶

In [ ]:
!pip install --upgrade eazyml-insight
!pip install --upgrade eazyml-automl
!pip install gdown python-dotenv
In [1]:
import os
from eazyml_insight import (
    ez_insight,
    ez_init,
    ez_validate
)

from eazyml import ez_display_df
import gdown
import pandas as pd

from dotenv import load_dotenv
load_dotenv()
Out[1]:
True

1. Initialize EazyML¶

The ez_init function uses the EAZYML_ACCESS_KEY environment variable for authentication. If the variable is not set, it defaults to a trial license.

In [2]:
ez_init(access_key=os.getenv('EAZYML_ACCESS_KEY'))
Out[2]:
{'success': True,
 'message': 'Initialized successfully. You may revoke your consent to sharing usage stats anytime. You have exclusive paid access.'}

2. Define Dataset Files and Outcome Variable¶

In [ ]:
gdown.download_folder(id='1EobxYR3pg_Z3Sd4sETfe4aJLAsT98fL2')
In [3]:
# Names of the files that will be used by EazyML APIs
train_file_path = os.path.join('data', "Heart_Attack_traindata.csv")
test_file_path = os.path.join('data', "Heart_Attack_testdata.csv")

# The column name for outcome of interest
outcome = "class"

3. Dataset Information¶

The dataset used in this notebook is the Heart Attack Dataset, which is a well-known dataset in machine learning and statistics. It contains data about patients, with several features (such as age, gender, blood pressure levels, and heart-related measurements) to predict the likelihood of a heart attack.

Columns in the Dataset:¶

  • age: The age of the patient, measured in years.
  • gender: The gender of the patient, represented as a categorical variable (e.g., 1 = male, 0 = female).
  • impulse: Refers to the patient's pulse rate, measured in beats per minute (bpm).
  • pressurehight: Refers to systolic blood pressure, the higher number in a blood pressure reading (e.g., 120/80 mmHg).
  • pressurelow: Refers to diastolic blood pressure, the lower number in a blood pressure reading (e.g., 120/80 mmHg).
  • glucose: A measurement related to the heart, likely referring to potassium (K) concentration.
  • kcm: This refer to a measurement related to the heart, related to potassium (K) concentration.
  • troponin: A protein found in the heart muscle, measured to assess heart damage (especially after a heart attack).
  • class: The target variable, indicating the presence or absence of a condition or disease (e.g., 1 = heart attack, 0 = no heart attack).

3.1 Display the Dataset¶

Below is a preview of the dataset:

In [4]:
# Load the dataset from the provided file
train = pd.read_csv(train_file_path)

# Display the first few rows of the dataset
ez_display_df(train.head())
  age gender impluse pressurehight pressurelow glucose kcm troponin class
0 64 1 66 160 83 160.000000 1.800000 0.012000 negative
1 21 1 94 98 46 296.000000 6.750000 1.060000 positive
2 55 1 64 160 77 270.000000 1.990000 0.003000 negative
3 64 1 70 120 55 270.000000 13.870000 0.122000 positive
4 55 1 64 112 65 300.000000 1.080000 0.003000 negative

4. EazyML Insights¶

4.1 Auto-derive Insights¶

4.1.1 Build Insight Model¶

In [5]:
response = ez_insight(train_file_path, outcome, options={})

4.1.2 Convert Response to DataFrame¶

In [6]:
insights_df = pd.DataFrame(response['insights']['data'], columns=response['insights']['columns'])

4.1.3 Display Augmented Insights¶

4.1.3.1 For Class positive¶
In [7]:
insights_df1 = insights_df[insights_df[outcome] == 'positive']
ez_display_df(insights_df1.head())
  class Augmented Intelligence Insights Insight Scores
0 positive troponin is greater than 0.01 0.932400
1 positive pressurelow is less than equal to 80.5, troponin is greater than 0.01 0.930900
2 positive troponin is greater than 0.01, kcm is greater than 0.88 0.930000
3 positive troponin is greater than 0.01, pressurehight is greater than 59.0 0.920700
4 positive troponin is greater than 0.01, pressurehight is greater than 59.0, kcm is greater than 0.51 0.916000
4.1.3.2 For Class negative¶
In [8]:
insights_df0 = insights_df[insights_df[outcome] == 'negative']
ez_display_df(insights_df0.head())
  class Augmented Intelligence Insights Insight Scores
86 negative pressurelow is less than equal to 80.5, troponin is less than equal to 0.01, kcm is less than equal to 6.29 0.962300
87 negative troponin is less than equal to 0.01, kcm is less than equal to 6.29 0.948500
88 negative troponin is less than equal to 0.01, pressurehight is less than equal to 151.5, kcm is less than equal to 6.29 0.928300
89 negative troponin is less than equal to 0.01, kcm is less than equal to 4.88, glucose is greater than 55.0 0.927600
90 negative troponin is less than equal to 0.01, kcm is less than equal to 4.88, impluse is less than equal to 89.5 0.907600

4.2 Validation of Insights¶

4.2.1 Validating Insights¶

In [9]:
record_number = [3, 5]
options = {'record_number': record_number}
val_response = ez_validate(train_file_path, outcome, response['insights'], test_file_path, options=options)

4.2.2 Convert Response to DataFrame¶

In [10]:
validate_df = pd.DataFrame(val_response['validations']['data'], columns=val_response['validations']['columns'])

4.2.3 Display Validation Metrics¶

4.2.3.1 For Class positive¶
In [11]:
validate_df1 = validate_df[validate_df[outcome] == 'positive']
ez_display_df(validate_df1.head())
  Test Data Point Number class Augmented Intelligence Insights Insight Scores Accuracy Coverage Population Accuracy Count Total Population
109 110 positive troponin is greater than 0.01 0.932400 0.714300 0.482800 14 10 29
110 111 positive pressurelow is less than equal to 80.5, troponin is greater than 0.01 0.930900 0.875000 0.275900 8 7 29
111 112 positive troponin is greater than 0.01, kcm is greater than 0.88 0.930000 0.692300 0.448300 13 9 29
112 113 positive troponin is greater than 0.01, pressurehight is greater than 59.0 0.920700 0.714300 0.482800 14 10 29
113 114 positive troponin is greater than 0.01, pressurehight is greater than 59.0, kcm is greater than 0.51 0.916000 0.714300 0.482800 14 10 29
4.2.3.2 For Class negative¶
In [12]:
validate_df0 = validate_df[validate_df[outcome] == 'negative']
ez_display_df(validate_df0.head())
  Test Data Point Number class Augmented Intelligence Insights Insight Scores Accuracy Coverage Population Accuracy Count Total Population
0 1 negative pressurelow is less than equal to 80.5, troponin is less than equal to 0.01, kcm is less than equal to 6.29 0.962300 1.000000 0.310300 9 9 29
1 2 negative troponin is less than equal to 0.01, kcm is less than equal to 6.29 0.948500 1.000000 0.413800 12 12 29
2 3 negative troponin is less than equal to 0.01, pressurehight is less than equal to 151.5, kcm is less than equal to 6.29 0.928300 1.000000 0.241400 7 7 29
3 4 negative troponin is less than equal to 0.01, kcm is less than equal to 4.88, glucose is greater than 55.0 0.927600 1.000000 0.413800 12 12 29
4 5 negative troponin is less than equal to 0.01, kcm is less than equal to 4.88, impluse is less than equal to 89.5 0.907600 1.000000 0.379300 11 11 29

4.2.4 Display Filtered Data for Specific Record Numbers¶

In [13]:
for i in range(len(record_number)):
    print (val_response['validation_filter'][i]['Augmented Intelligence Insights'])
    filter_df = pd.DataFrame(val_response['validation_filter'][i]['filtered_data']['data'], columns=val_response[
                             'validation_filter'][i]['filtered_data']['columns']) 
    ez_display_df(filter_df.head())
    print ('\n')
troponin is less than equal to 0.01,
pressurehight is less than equal to 151.5,
kcm is less than equal to 6.29
  age impluse pressurehight pressurelow glucose kcm troponin class gender_0 gender_1
0 55 64 112 65 300 1.080000 0.003000 NEGATIVE False True
1 58 61 112 58 87 1.830000 0.004000 NEGATIVE True False
2 47 76 120 70 319 2.570000 0.003000 NEGATIVE False True
3 45 70 100 68 96 0.606000 0.004000 NEGATIVE True False
4 37 72 107 86 274 2.890000 0.003000 NEGATIVE True False

troponin is less than equal to 0.01,
kcm is less than equal to 4.88,
impluse is less than equal to 89.5
  age impluse pressurehight pressurelow glucose kcm troponin class gender_0 gender_1
0 55 64 160 77 270 1.990000 0.003000 NEGATIVE False True
1 55 64 112 65 300 1.080000 0.003000 NEGATIVE False True
2 58 61 112 58 87 1.830000 0.004000 NEGATIVE True False
3 32 40 179 68 102 0.710000 0.003000 NEGATIVE True False
4 44 60 154 81 135 2.350000 0.004000 NEGATIVE True False

In [ ]: