EazyML Insights Template¶

Define Imports¶

In [ ]:
!pip install --upgrade eazyml-insight
!pip install --upgrade eazyml-automl
!pip install gdown python-dotenv
In [1]:
import os
from eazyml_insight import (
    ez_insight,
    ez_init,
    ez_validate
)

from eazyml import ez_display_df
import gdown
import pandas as pd

from dotenv import load_dotenv
load_dotenv()
Out[1]:
True

1. Initialize EazyML¶

The ez_init function uses the EAZYML_ACCESS_KEY environment variable for authentication. If the variable is not set, it defaults to a trial license.

In [2]:
ez_init(access_key=os.getenv('EAZYML_ACCESS_KEY'))
Out[2]:
{'success': True,
 'message': 'Initialized successfully. You may revoke your consent to sharing usage stats anytime. You have exclusive paid access.'}

2. Define Dataset Files and Outcome Variable¶

In [ ]:
gdown.download_folder(id='1p7Udh2MjKyJPxI47FS89VowAz9ZEq_hG')
In [3]:
# Names of the files that will be used by EazyML APIs
train_file_path = os.path.join('data', "House Price Prediction - Train Data.xlsx")
test_file_path = os.path.join('data', "House Price Prediction - Test Data.xlsx")

# The column name for outcome of interest
outcome = "House_Price"

3. Dataset Information¶

The dataset used in this notebook is the Housing Price Prediction Dataset, which is a well-known dataset in machine learning and data science. It contains data about various house features and their corresponding sale prices. The goal is to predict the sale price of a house based on its attributes.

Columns in the Dataset:¶

  • Square_Footage: Total area of the house in square feet; larger homes typically have higher prices.
  • Num_Bedrooms: Number of bedrooms in the house; more bedrooms usually increase the value.
  • Num_Bathrooms: Number of bathrooms in the house; more bathrooms often correlate with higher prices.
  • Year_Built: The year the house was built; newer homes may have higher prices due to modern features.
  • Lot_Size: Size of the property in square feet; larger lots can increase the property's value.
  • Garage_Size: Size of the garage (e.g., number of cars it can hold); larger garages may increase value.
  • Neighborhood_Quality: Qualitative rating of the neighborhood; higher quality usually means higher prices.
  • House_Price: The selling price of the house; this is the target variable for prediction models.

3.1 Display the Dataset¶

Below is a preview of the dataset:

In [4]:
# Load the dataset from the provided file
train = pd.read_excel(train_file_path)

# Display the first few rows of the dataset
ez_display_df(train.head())
  Square_Footage Num_Bedrooms Num_Bathrooms Year_Built Lot_Size Garage_Size Neighborhood_Quality House_Price
0 4235 3 3 2000 1.911679 1 8 917235.410532
1 4006 4 2 2003 1.092441 2 4 871566.562740
2 785 5 3 1995 3.823276 2 3 262707.278933
3 2827 3 1 1977 3.213678 2 4 605143.959115
4 2219 4 1 1965 0.725965 0 4 470083.290367

4. EazyML Insights¶

4.1 Auto-derive Insights¶

4.1.1 Build Insight Model¶

In [5]:
response = ez_insight(train_file_path, outcome, options={})

4.1.2 Convert Response to DataFrame¶

In [6]:
insights_df = pd.DataFrame(response['insights']['data'], columns=response['insights']['columns'])

4.1.3 Display Augmented Insights¶

In [7]:
ez_display_df(insights_df.head())
  House_Price Augmented Intelligence Insights Insight Scores
0 783720.906 [+/- 44168.201] Square_Footage in ( 3344.0, 3918.0 ) 0.924100
1 530132.99 [+/- 31276.672] Square_Footage in ( 2220.5, 2508.5 ) 0.896700
2 990834.236 [+/- 28225.397] Square_Footage in ( 4350.5, 4742.5 ), Year_Built is greater than 1975.5 0.875700
3 736522.541 [+/- 19601.668] Square_Footage in ( 3344.0, 3610.5 ), Lot_Size is less than equal to 3.08 0.871100
4 802289.279 [+/- 22598.325] Square_Footage in ( 3610.5, 3918.0 ), Lot_Size is less than equal to 3.14, Year_Built is greater than 1969.0 0.869700

4.2 Validation of Insights¶

4.2.1 Validating Insights¶

In [8]:
record_number = [3, 5]
options = {'record_number': record_number}
val_response = ez_validate(train_file_path, outcome, response['insights'], test_file_path, options=options)

4.2.2 Convert Response to DataFrame¶

In [9]:
validate_df = pd.DataFrame(val_response['validations']['data'], columns=val_response['validations']['columns'])

4.2.3 Display Validation Metrics¶

In [10]:
ez_display_df(validate_df.head())
  Test Data Point Number House_Price Augmented Intelligence Insights Insight Scores Accuracy Coverage Population RMSE Total Population
0 1 783720.906 [+/- 44168.201] Square_Footage in ( 3344.0, 3918.0 ) 0.924100 0.950400 0.200000 20 40244.972800 100
1 2 530132.99 [+/- 31276.672] Square_Footage in ( 2220.5, 2508.5 ) 0.896700 0.948900 0.050000 5 27888.187400 100
2 3 990834.236 [+/- 28225.397] Square_Footage in ( 4350.5, 4742.5 ), Year_Built is greater than 1975.5 0.875700 0.975100 0.050000 5 25413.877400 100
3 4 736522.541 [+/- 19601.668] Square_Footage in ( 3344.0, 3610.5 ), Lot_Size is less than equal to 3.08 0.871100 0.955200 0.070000 7 34945.526700 100
4 5 802289.279 [+/- 22598.325] Square_Footage in ( 3610.5, 3918.0 ), Lot_Size is less than equal to 3.14, Year_Built is greater than 1969.0 0.869700 0.995500 0.030000 3 3666.763100 100

4.2.4 Display Filtered Data for Specific Record Numbers¶

In [11]:
for i in range(len(record_number)):
    print (val_response['validation_filter'][i]['Augmented Intelligence Insights'])
    filter_df = pd.DataFrame(val_response['validation_filter'][i]['filtered_data']['data'], columns=val_response[
                             'validation_filter'][i]['filtered_data']['columns']) 
    ez_display_df(filter_df.head())
    print ('\n')
Square_Footage in ( 4350.5, 4742.5 ),
Year_Built is greater than 1975.5
  Square_Footage Year_Built Lot_Size Neighborhood_Quality House_Price Num_Bedrooms_1 Num_Bedrooms_2 Num_Bedrooms_3 Num_Bedrooms_4 Num_Bedrooms_5 Num_Bathrooms_1 Num_Bathrooms_2 Num_Bathrooms_3 Garage_Size_0 Garage_Size_1 Garage_Size_2 filter
0 4638 2000 1.490399 3 998439.237114 False False False True False False False True False True False 1
1 4363 2018 4.933867 3 1008539.156234 False False False False True False True False False True False 1
2 4671 2017 1.931223 7 1028282.311822 False False False True False False False True False True False 1
3 4615 2000 1.721147 4 993273.079661 False False False True False False True False False True False 1
4 4493 2008 0.966554 4 951093.546994 False False True False False False False True True False False 1

Square_Footage in ( 3610.5, 3918.0 ),
Lot_Size is less than equal to 3.14,
Year_Built is greater than 1969.0
  Square_Footage Year_Built Lot_Size Neighborhood_Quality House_Price Num_Bedrooms_1 Num_Bedrooms_2 Num_Bedrooms_3 Num_Bedrooms_4 Num_Bedrooms_5 Num_Bathrooms_1 Num_Bathrooms_2 Num_Bathrooms_3 Garage_Size_0 Garage_Size_1 Garage_Size_2 filter
0 3778 2006 1.576141 5 818613.710803 False True False False False False True False False False True 1
1 3670 2012 2.063782 7 813296.809026 False False True False False False False True False True False 1
2 3702 1984 2.892098 10 809686.201144 False False False False True True False False True False False 1

In [ ]: