EazyML Modeling: Walmart Regression¶
Define Imports¶
In [ ]:
!pip install --upgrade eazyml-automl
!pip install gdown python-dotenv
In [1]:
import os
import pandas as pd
import eazyml as ez
import gdown
from dotenv import load_dotenv
load_dotenv()
Out[1]:
True
1. Initialize EazyML¶
The ez_init function uses the EAZYML_ACCESS_KEY environment variable for authentication. If the variable is not set, it defaults to a trial license.
In [2]:
ez.ez_init(os.getenv('EAZYML_ACCESS_KEY'))
Out[2]:
{'success': True,
'message': 'Initialized successfully. You may revoke your consent to sharing usage stats anytime. You have exclusive paid access.'}
2. Define Dataset Files, Outcome Variable and Train Model¶
2.1 Define Train Dataset and Other Model Parameters¶
In [ ]:
gdown.download_folder(id='16LfwRMjchrPgdbsgPHr79AHvNCHsL5Is')
In [3]:
reg_file_path = os.path.join('data', "walmart_train_data.csv")
reg_outcome = "Weekly_Sales"
df_reg = pd.read_csv(reg_file_path)
options = {'model_type': 'predictive'}
2.2 Train Model¶
In [4]:
resp_reg = ez.ez_build_model(df_reg, outcome=reg_outcome, options=options)
2.3 Show Model Performance¶
In [5]:
ez.ez_display_df(resp_reg['model_performance'])
| Model | Rsquared | RMSE | |
|---|---|---|---|
| 0 | Gradient Boosting Regressor | 0.94 | 1835.38 |
| 1 | Bagged Decision Trees | 0.93 | 2000.09 |
| 2 | Random Forest | 0.93 | 2006.82 |
| 3 | Boosted Decision Trees | 0.76 | 3646.40 |
| 4 | Lasso Regression | 0.10 | 6950.23 |
| 5 | Linear Regression | 0.10 | 6950.26 |
| 6 | Ridge Regression | 0.10 | 6950.26 |
3. Dataset Information¶
The dataset used in this notebook is the Walmart Dataset, which contains data related to sales at Walmart stores. It includes various features such as store, fuel price, sales data, and other metrics over a specified period of time.
You can find more details and download the dataset from Kaggle using the following link:
Columns in the Dataset:¶
- Store: The store number.
- Weekly_Sales: Sales for the given store.
- IsHoliday: Whether the week is a special holiday week 1 – Holiday week 0 – Non-holiday week.
- Temperature: Temperature on the day of sale.
- Fuel_Price: Cost of fuel in the region.
- CPI: Prevailing consumer price index.
- Unemployment: Prevailing unemployment rate,
3.1 Display the Dataset¶
Below is a preview of the dataset:
In [6]:
# Load the dataset from the provided file
train = pd.read_csv(reg_file_path)
# Display the first few rows of the dataset
ez.ez_display_df(train.head())
| Store | Weekly_Sales | IsHoliday | Temperature | Fuel_Price | CPI | Unemployment | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 22516.313699 | 0.000000 | 42.310000 | 2.572000 | 211.096358 | 8.106000 |
| 1 | 1 | 22804.964444 | 1.000000 | 38.510000 | 2.548000 | 211.242170 | 8.106000 |
| 2 | 1 | 22081.755753 | 0.000000 | 39.930000 | 2.514000 | 211.289143 | 8.106000 |
| 3 | 1 | 19579.549861 | 0.000000 | 46.630000 | 2.561000 | 211.319643 | 8.106000 |
| 4 | 1 | 21298.721644 | 0.000000 | 46.500000 | 2.625000 | 211.350143 | 8.106000 |
4. Define Test Dataset and Predict on that Dataset¶
In [7]:
reg_model_info = resp_reg["model_info"]
4.1 Define Test Dataset¶
In [8]:
reg_test_file_path = os.path.join('data', "walmart_train_data.csv")
reg_test_data = pd.read_csv(reg_test_file_path)
4.2 Predict on Test Dataset¶
In [9]:
options = {}
reg_pred_df = ez.ez_predict(reg_test_data, model_info=reg_model_info, options=options)
pred_df = reg_pred_df['pred_df']
In [10]:
ez.ez_display_df(pred_df.head())
| Store | Weekly_Sales | IsHoliday | Temperature | Fuel_Price | CPI | Unemployment | Predicted Weekly_Sales | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 22516.313699 | 0 | 42.310000 | 2.572000 | 211.096358 | 8.106000 | 22386.510000 |
| 1 | 1 | 22804.964444 | 1 | 38.510000 | 2.548000 | 211.242170 | 8.106000 | 22894.660000 |
| 2 | 1 | 22081.755753 | 0 | 39.930000 | 2.514000 | 211.289143 | 8.106000 | 22262.500000 |
| 3 | 1 | 19579.549861 | 0 | 46.630000 | 2.561000 | 211.319643 | 8.106000 | 20111.170000 |
| 4 | 1 | 21298.721644 | 0 | 46.500000 | 2.625000 | 211.350143 | 8.106000 | 21138.440000 |
In [ ]: