EazyML Insights Template¶
Define Imports¶
In [ ]:
!pip install --upgrade eazyml-insight
!pip install --upgrade eazyml-automl
!pip install gdown python-dotenv
In [1]:
import os
from eazyml_insight import (
ez_insight,
ez_init,
ez_validate
)
from eazyml import ez_display_df
import gdown
import pandas as pd
from dotenv import load_dotenv
load_dotenv()
Out[1]:
True
1. Initialize EazyML¶
The ez_init function uses the EAZYML_ACCESS_KEY environment variable for authentication. If the variable is not set, it defaults to a trial license.
In [2]:
ez_init(access_key=os.getenv('EAZYML_ACCESS_KEY'))
Out[2]:
{'success': True,
'message': 'Initialized successfully. You may revoke your consent to sharing usage stats anytime. You have exclusive paid access.'}
2. Define Dataset Files and Outcome Variable¶
In [ ]:
gdown.download_folder(id='1EobxYR3pg_Z3Sd4sETfe4aJLAsT98fL2')
In [3]:
# Names of the files that will be used by EazyML APIs
train_file_path = os.path.join('data', "Heart_Attack_traindata.csv")
test_file_path = os.path.join('data', "Heart_Attack_testdata.csv")
# The column name for outcome of interest
outcome = "class"
3. Dataset Information¶
The dataset used in this notebook is the Heart Attack Dataset, which is a well-known dataset in machine learning and statistics. It contains data about patients, with several features (such as age, gender, blood pressure levels, and heart-related measurements) to predict the likelihood of a heart attack.
Columns in the Dataset:¶
- age: The age of the patient, measured in years.
- gender: The gender of the patient, represented as a categorical variable (e.g., 1 = male, 0 = female).
- impulse: Refers to the patient's pulse rate, measured in beats per minute (bpm).
- pressurehight: Refers to systolic blood pressure, the higher number in a blood pressure reading (e.g., 120/80 mmHg).
- pressurelow: Refers to diastolic blood pressure, the lower number in a blood pressure reading (e.g., 120/80 mmHg).
- glucose: A measurement related to the heart, likely referring to potassium (K) concentration.
- kcm: This refer to a measurement related to the heart, related to potassium (K) concentration.
- troponin: A protein found in the heart muscle, measured to assess heart damage (especially after a heart attack).
- class: The target variable, indicating the presence or absence of a condition or disease (e.g., 1 = heart attack, 0 = no heart attack).
3.1 Display the Dataset¶
Below is a preview of the dataset:
In [4]:
# Load the dataset from the provided file
train = pd.read_csv(train_file_path)
# Display the first few rows of the dataset
ez_display_df(train.head())
| age | gender | impluse | pressurehight | pressurelow | glucose | kcm | troponin | class | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 64 | 1 | 66 | 160 | 83 | 160.000000 | 1.800000 | 0.012000 | negative |
| 1 | 21 | 1 | 94 | 98 | 46 | 296.000000 | 6.750000 | 1.060000 | positive |
| 2 | 55 | 1 | 64 | 160 | 77 | 270.000000 | 1.990000 | 0.003000 | negative |
| 3 | 64 | 1 | 70 | 120 | 55 | 270.000000 | 13.870000 | 0.122000 | positive |
| 4 | 55 | 1 | 64 | 112 | 65 | 300.000000 | 1.080000 | 0.003000 | negative |
4. EazyML Insights¶
4.1 Auto-derive Insights¶
4.1.1 Build Insight Model¶
In [5]:
response = ez_insight(train_file_path, outcome, options={})
4.1.2 Convert Response to DataFrame¶
In [6]:
insights_df = pd.DataFrame(response['insights']['data'], columns=response['insights']['columns'])
4.1.3 Display Augmented Insights¶
4.1.3.1 For Class positive¶
In [7]:
insights_df1 = insights_df[insights_df[outcome] == 'positive']
ez_display_df(insights_df1.head())
| class | Augmented Intelligence Insights | Insight Scores | |
|---|---|---|---|
| 0 | positive | troponin is greater than 0.01 | 0.932400 |
| 1 | positive | pressurelow is less than equal to 80.5, troponin is greater than 0.01 | 0.930900 |
| 2 | positive | troponin is greater than 0.01, kcm is greater than 0.88 | 0.930000 |
| 3 | positive | troponin is greater than 0.01, pressurehight is greater than 59.0 | 0.920700 |
| 4 | positive | troponin is greater than 0.01, pressurehight is greater than 59.0, kcm is greater than 0.51 | 0.916000 |
4.1.3.2 For Class negative¶
In [8]:
insights_df0 = insights_df[insights_df[outcome] == 'negative']
ez_display_df(insights_df0.head())
| class | Augmented Intelligence Insights | Insight Scores | |
|---|---|---|---|
| 86 | negative | pressurelow is less than equal to 80.5, troponin is less than equal to 0.01, kcm is less than equal to 6.29 | 0.962300 |
| 87 | negative | troponin is less than equal to 0.01, kcm is less than equal to 6.29 | 0.948500 |
| 88 | negative | troponin is less than equal to 0.01, pressurehight is less than equal to 151.5, kcm is less than equal to 6.29 | 0.928300 |
| 89 | negative | troponin is less than equal to 0.01, kcm is less than equal to 4.88, glucose is greater than 55.0 | 0.927600 |
| 90 | negative | troponin is less than equal to 0.01, kcm is less than equal to 4.88, impluse is less than equal to 89.5 | 0.907600 |
4.2 Validation of Insights¶
4.2.1 Validating Insights¶
In [9]:
record_number = [3, 5]
options = {'record_number': record_number}
val_response = ez_validate(train_file_path, outcome, response['insights'], test_file_path, options=options)
4.2.2 Convert Response to DataFrame¶
In [10]:
validate_df = pd.DataFrame(val_response['validations']['data'], columns=val_response['validations']['columns'])
4.2.3 Display Validation Metrics¶
4.2.3.1 For Class positive¶
In [11]:
validate_df1 = validate_df[validate_df[outcome] == 'positive']
ez_display_df(validate_df1.head())
| Test Data Point Number | class | Augmented Intelligence Insights | Insight Scores | Accuracy | Coverage | Population | Accuracy Count | Total Population | |
|---|---|---|---|---|---|---|---|---|---|
| 109 | 110 | positive | troponin is greater than 0.01 | 0.932400 | 0.714300 | 0.482800 | 14 | 10 | 29 |
| 110 | 111 | positive | pressurelow is less than equal to 80.5, troponin is greater than 0.01 | 0.930900 | 0.875000 | 0.275900 | 8 | 7 | 29 |
| 111 | 112 | positive | troponin is greater than 0.01, kcm is greater than 0.88 | 0.930000 | 0.692300 | 0.448300 | 13 | 9 | 29 |
| 112 | 113 | positive | troponin is greater than 0.01, pressurehight is greater than 59.0 | 0.920700 | 0.714300 | 0.482800 | 14 | 10 | 29 |
| 113 | 114 | positive | troponin is greater than 0.01, pressurehight is greater than 59.0, kcm is greater than 0.51 | 0.916000 | 0.714300 | 0.482800 | 14 | 10 | 29 |
4.2.3.2 For Class negative¶
In [12]:
validate_df0 = validate_df[validate_df[outcome] == 'negative']
ez_display_df(validate_df0.head())
| Test Data Point Number | class | Augmented Intelligence Insights | Insight Scores | Accuracy | Coverage | Population | Accuracy Count | Total Population | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | negative | pressurelow is less than equal to 80.5, troponin is less than equal to 0.01, kcm is less than equal to 6.29 | 0.962300 | 1.000000 | 0.310300 | 9 | 9 | 29 |
| 1 | 2 | negative | troponin is less than equal to 0.01, kcm is less than equal to 6.29 | 0.948500 | 1.000000 | 0.413800 | 12 | 12 | 29 |
| 2 | 3 | negative | troponin is less than equal to 0.01, pressurehight is less than equal to 151.5, kcm is less than equal to 6.29 | 0.928300 | 1.000000 | 0.241400 | 7 | 7 | 29 |
| 3 | 4 | negative | troponin is less than equal to 0.01, kcm is less than equal to 4.88, glucose is greater than 55.0 | 0.927600 | 1.000000 | 0.413800 | 12 | 12 | 29 |
| 4 | 5 | negative | troponin is less than equal to 0.01, kcm is less than equal to 4.88, impluse is less than equal to 89.5 | 0.907600 | 1.000000 | 0.379300 | 11 | 11 | 29 |
4.2.4 Display Filtered Data for Specific Record Numbers¶
In [13]:
for i in range(len(record_number)):
print (val_response['validation_filter'][i]['Augmented Intelligence Insights'])
filter_df = pd.DataFrame(val_response['validation_filter'][i]['filtered_data']['data'], columns=val_response[
'validation_filter'][i]['filtered_data']['columns'])
ez_display_df(filter_df.head())
print ('\n')
troponin is less than equal to 0.01, pressurehight is less than equal to 151.5, kcm is less than equal to 6.29
| age | impluse | pressurehight | pressurelow | glucose | kcm | troponin | class | gender_0 | gender_1 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 55 | 64 | 112 | 65 | 300 | 1.080000 | 0.003000 | NEGATIVE | False | True |
| 1 | 58 | 61 | 112 | 58 | 87 | 1.830000 | 0.004000 | NEGATIVE | True | False |
| 2 | 47 | 76 | 120 | 70 | 319 | 2.570000 | 0.003000 | NEGATIVE | False | True |
| 3 | 45 | 70 | 100 | 68 | 96 | 0.606000 | 0.004000 | NEGATIVE | True | False |
| 4 | 37 | 72 | 107 | 86 | 274 | 2.890000 | 0.003000 | NEGATIVE | True | False |
troponin is less than equal to 0.01, kcm is less than equal to 4.88, impluse is less than equal to 89.5
| age | impluse | pressurehight | pressurelow | glucose | kcm | troponin | class | gender_0 | gender_1 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 55 | 64 | 160 | 77 | 270 | 1.990000 | 0.003000 | NEGATIVE | False | True |
| 1 | 55 | 64 | 112 | 65 | 300 | 1.080000 | 0.003000 | NEGATIVE | False | True |
| 2 | 58 | 61 | 112 | 58 | 87 | 1.830000 | 0.004000 | NEGATIVE | True | False |
| 3 | 32 | 40 | 179 | 68 | 102 | 0.710000 | 0.003000 | NEGATIVE | True | False |
| 4 | 44 | 60 | 154 | 81 | 135 | 2.350000 | 0.004000 | NEGATIVE | True | False |
In [ ]: