EazyML Insights Template¶
Define Imports¶
In [ ]:
!pip install --upgrade eazyml-insight
!pip install --upgrade eazyml-automl
!pip install gdown python-dotenv
In [1]:
import os
from eazyml_insight import (
ez_insight,
ez_init,
ez_validate
)
from eazyml import ez_display_df
import gdown
import pandas as pd
from dotenv import load_dotenv
load_dotenv()
Out[1]:
True
1. Initialize EazyML¶
The ez_init function uses the EAZYML_ACCESS_KEY environment variable for authentication. If the variable is not set, it defaults to a trial license.
In [2]:
ez_init(access_key=os.getenv('EAZYML_ACCESS_KEY'))
Out[2]:
{'success': True,
'message': 'Initialized successfully. You may revoke your consent to sharing usage stats anytime. You have exclusive paid access.'}
2. Define Dataset Files and Outcome Variable¶
In [ ]:
gdown.download_folder(id='1p7Udh2MjKyJPxI47FS89VowAz9ZEq_hG')
In [3]:
# Names of the files that will be used by EazyML APIs
train_file_path = os.path.join('data', "House Price Prediction - Train Data.xlsx")
test_file_path = os.path.join('data', "House Price Prediction - Test Data.xlsx")
# The column name for outcome of interest
outcome = "House_Price"
3. Dataset Information¶
The dataset used in this notebook is the Housing Price Prediction Dataset, which is a well-known dataset in machine learning and data science. It contains data about various house features and their corresponding sale prices. The goal is to predict the sale price of a house based on its attributes.
Columns in the Dataset:¶
- Square_Footage: Total area of the house in square feet; larger homes typically have higher prices.
- Num_Bedrooms: Number of bedrooms in the house; more bedrooms usually increase the value.
- Num_Bathrooms: Number of bathrooms in the house; more bathrooms often correlate with higher prices.
- Year_Built: The year the house was built; newer homes may have higher prices due to modern features.
- Lot_Size: Size of the property in square feet; larger lots can increase the property's value.
- Garage_Size: Size of the garage (e.g., number of cars it can hold); larger garages may increase value.
- Neighborhood_Quality: Qualitative rating of the neighborhood; higher quality usually means higher prices.
- House_Price: The selling price of the house; this is the target variable for prediction models.
3.1 Display the Dataset¶
Below is a preview of the dataset:
In [4]:
# Load the dataset from the provided file
train = pd.read_excel(train_file_path)
# Display the first few rows of the dataset
ez_display_df(train.head())
| Square_Footage | Num_Bedrooms | Num_Bathrooms | Year_Built | Lot_Size | Garage_Size | Neighborhood_Quality | House_Price | |
|---|---|---|---|---|---|---|---|---|
| 0 | 4235 | 3 | 3 | 2000 | 1.911679 | 1 | 8 | 917235.410532 |
| 1 | 4006 | 4 | 2 | 2003 | 1.092441 | 2 | 4 | 871566.562740 |
| 2 | 785 | 5 | 3 | 1995 | 3.823276 | 2 | 3 | 262707.278933 |
| 3 | 2827 | 3 | 1 | 1977 | 3.213678 | 2 | 4 | 605143.959115 |
| 4 | 2219 | 4 | 1 | 1965 | 0.725965 | 0 | 4 | 470083.290367 |
4. EazyML Insights¶
4.1 Auto-derive Insights¶
4.1.1 Build Insight Model¶
In [5]:
response = ez_insight(train_file_path, outcome, options={})
4.1.2 Convert Response to DataFrame¶
In [6]:
insights_df = pd.DataFrame(response['insights']['data'], columns=response['insights']['columns'])
4.1.3 Display Augmented Insights¶
In [7]:
ez_display_df(insights_df.head())
| House_Price | Augmented Intelligence Insights | Insight Scores | |
|---|---|---|---|
| 0 | 783720.906 [+/- 44168.201] | Square_Footage in ( 3344.0, 3918.0 ) | 0.924100 |
| 1 | 530132.99 [+/- 31276.672] | Square_Footage in ( 2220.5, 2508.5 ) | 0.896700 |
| 2 | 990834.236 [+/- 28225.397] | Square_Footage in ( 4350.5, 4742.5 ), Year_Built is greater than 1975.5 | 0.875700 |
| 3 | 736522.541 [+/- 19601.668] | Square_Footage in ( 3344.0, 3610.5 ), Lot_Size is less than equal to 3.08 | 0.871100 |
| 4 | 802289.279 [+/- 22598.325] | Square_Footage in ( 3610.5, 3918.0 ), Lot_Size is less than equal to 3.14, Year_Built is greater than 1969.0 | 0.869700 |
4.2 Validation of Insights¶
4.2.1 Validating Insights¶
In [8]:
record_number = [3, 5]
options = {'record_number': record_number}
val_response = ez_validate(train_file_path, outcome, response['insights'], test_file_path, options=options)
4.2.2 Convert Response to DataFrame¶
In [9]:
validate_df = pd.DataFrame(val_response['validations']['data'], columns=val_response['validations']['columns'])
4.2.3 Display Validation Metrics¶
In [10]:
ez_display_df(validate_df.head())
| Test Data Point Number | House_Price | Augmented Intelligence Insights | Insight Scores | Accuracy | Coverage | Population | RMSE | Total Population | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 783720.906 [+/- 44168.201] | Square_Footage in ( 3344.0, 3918.0 ) | 0.924100 | 0.950400 | 0.200000 | 20 | 40244.972800 | 100 |
| 1 | 2 | 530132.99 [+/- 31276.672] | Square_Footage in ( 2220.5, 2508.5 ) | 0.896700 | 0.948900 | 0.050000 | 5 | 27888.187400 | 100 |
| 2 | 3 | 990834.236 [+/- 28225.397] | Square_Footage in ( 4350.5, 4742.5 ), Year_Built is greater than 1975.5 | 0.875700 | 0.975100 | 0.050000 | 5 | 25413.877400 | 100 |
| 3 | 4 | 736522.541 [+/- 19601.668] | Square_Footage in ( 3344.0, 3610.5 ), Lot_Size is less than equal to 3.08 | 0.871100 | 0.955200 | 0.070000 | 7 | 34945.526700 | 100 |
| 4 | 5 | 802289.279 [+/- 22598.325] | Square_Footage in ( 3610.5, 3918.0 ), Lot_Size is less than equal to 3.14, Year_Built is greater than 1969.0 | 0.869700 | 0.995500 | 0.030000 | 3 | 3666.763100 | 100 |
4.2.4 Display Filtered Data for Specific Record Numbers¶
In [11]:
for i in range(len(record_number)):
print (val_response['validation_filter'][i]['Augmented Intelligence Insights'])
filter_df = pd.DataFrame(val_response['validation_filter'][i]['filtered_data']['data'], columns=val_response[
'validation_filter'][i]['filtered_data']['columns'])
ez_display_df(filter_df.head())
print ('\n')
Square_Footage in ( 4350.5, 4742.5 ), Year_Built is greater than 1975.5
| Square_Footage | Year_Built | Lot_Size | Neighborhood_Quality | House_Price | Num_Bedrooms_1 | Num_Bedrooms_2 | Num_Bedrooms_3 | Num_Bedrooms_4 | Num_Bedrooms_5 | Num_Bathrooms_1 | Num_Bathrooms_2 | Num_Bathrooms_3 | Garage_Size_0 | Garage_Size_1 | Garage_Size_2 | filter | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 4638 | 2000 | 1.490399 | 3 | 998439.237114 | False | False | False | True | False | False | False | True | False | True | False | 1 |
| 1 | 4363 | 2018 | 4.933867 | 3 | 1008539.156234 | False | False | False | False | True | False | True | False | False | True | False | 1 |
| 2 | 4671 | 2017 | 1.931223 | 7 | 1028282.311822 | False | False | False | True | False | False | False | True | False | True | False | 1 |
| 3 | 4615 | 2000 | 1.721147 | 4 | 993273.079661 | False | False | False | True | False | False | True | False | False | True | False | 1 |
| 4 | 4493 | 2008 | 0.966554 | 4 | 951093.546994 | False | False | True | False | False | False | False | True | True | False | False | 1 |
Square_Footage in ( 3610.5, 3918.0 ), Lot_Size is less than equal to 3.14, Year_Built is greater than 1969.0
| Square_Footage | Year_Built | Lot_Size | Neighborhood_Quality | House_Price | Num_Bedrooms_1 | Num_Bedrooms_2 | Num_Bedrooms_3 | Num_Bedrooms_4 | Num_Bedrooms_5 | Num_Bathrooms_1 | Num_Bathrooms_2 | Num_Bathrooms_3 | Garage_Size_0 | Garage_Size_1 | Garage_Size_2 | filter | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3778 | 2006 | 1.576141 | 5 | 818613.710803 | False | True | False | False | False | False | True | False | False | False | True | 1 |
| 1 | 3670 | 2012 | 2.063782 | 7 | 813296.809026 | False | False | True | False | False | False | False | True | False | True | False | 1 |
| 2 | 3702 | 1984 | 2.892098 | 10 | 809686.201144 | False | False | False | False | True | True | False | False | True | False | False | 1 |
In [ ]: