EazyML Counterfactual Template¶

Define Imports¶

In [ ]:
!pip install --upgrade eazyml-counterfactual
!pip install gdown python-dotenv
In [1]:
import os
import pandas as pd
import eazyml as ez
from eazyml_counterfactual import (
        ez_cf_inference,
        ez_init        
)
import gdown

from dotenv import load_dotenv
load_dotenv()
Out[1]:
True

1. Initialize EazyML¶

The ez_init function uses the EAZYML_ACCESS_KEY environment variable for authentication. If the variable is not set, it defaults to a trial license.

In [2]:
ez_init(os.getenv('EAZYML_ACCESS_KEY'))
Out[2]:
{'success': True,
 'message': 'Initialized successfully. You may revoke your consent to sharing usage stats anytime. You have exclusive paid access.'}

2. Define Dataset Files and Outcome Variable¶

In [ ]:
gdown.download_folder(id='1gWvCFW2cHqthUsPUQ0feOG4P41rpQwJC')
In [3]:
# Defining file paths for training and test datasets and specifying the outcome variable
train_file = os.path.join('data', "Mobile Price Ternary - Train Data.xlsx")
test_file = os.path.join('data', "Mobile Price Ternary - Test Data.xlsx")
outcome = "price_range"

# Loading the training dataset and the test dataset
train_df = pd.read_excel(train_file)
test_df = pd.read_excel(test_file)

3. Dataset Information¶

The dataset used in this notebook is the Mobile Price Classification Dataset, which contains data on mobile phones and their characteristics. It includes various features such as the mobile’s battery life, brand, camera quality, and other technical specifications that can help classify mobile phones into different price ranges.

You can find more details and download the dataset from Kaggle using the following link:

Kaggle Mobile Price Classification Dataset

Columns in the Dataset:¶

  • battery_power: The battery power of the mobile phone (in mAh).
  • blue: Whether the mobile has Bluetooth connectivity (1 = Yes, 0 = No).
  • clock_speed: The clock speed of the mobile’s processor (in GHz).
  • dual_sim: Whether the mobile supports dual SIM (1 = Yes, 0 = No).
  • fc: Front camera quality (in megapixels).
  • four_g: Whether the mobile supports 4G connectivity (1 = Yes, 0 = No).
  • int_memory: Internal memory of the mobile (in GB).
  • m_dep: Mobile depth (in cm).
  • mobile_wt: Weight of the mobile (in grams).
  • n_cores: Number of processor cores in the mobile.
  • pc: Primary camera quality (in megapixels).
  • px_height: Pixel Resolution Height.
  • px_width: Pixel Resolution Width.
  • ram: Random access memory of the mobile (in MB).
  • sc_h: Screen height of the mobile (in cm).
  • sc_w: Screen width of the mobile (in cm).
  • talk_time: Maximum talk time (in hours).
  • three_g: Whether the mobile supports 3G connectivity (1 = Yes, 0 = No).
  • touch_screen: Whether the mobile has a touch screen (1 = Yes, 0 = No).
  • wifi: Whether the mobile supports Wi-Fi connectivity (1 = Yes, 0 = No).
  • price_range: The price range of the mobile (target variable, with 4 possible classes: 0, 1, 2, 3).

3.1 Display the Dataset¶

Below is a preview of the dataset:

In [4]:
# Display the first few rows of the training DataFrame for inspection
ez.ez_display_df(train_df.head())
  battery_power blue clock_speed dual_sim fc four_g int_memory m_dep mobile_wt n_cores pc px_height px_width ram sc_h sc_w talk_time three_g touch_screen wifi price_range
0 1745 0 2.900000 0 0 1 3 0.900000 105 3 0 426 1629 1308 5 0 2 1 1 0 1
1 535 1 0.500000 1 8 1 54 0.500000 145 8 15 710 939 2674 14 8 10 1 0 0 2
2 1577 0 0.500000 1 0 1 42 0.300000 197 5 4 1045 1737 2060 19 6 12 1 0 0 2
3 1702 1 2.300000 0 12 1 52 0.500000 145 2 15 1397 1491 2501 16 12 4 1 0 0 3
4 707 0 2.100000 1 2 0 25 0.800000 131 3 17 495 574 3838 9 2 7 1 0 1 3

4. EazyML Modeling¶

4.1 Building model using the EazyML Modeling API¶

In [5]:
# Define model parameters
model_options = {
    "model_type": "predictive",
}

# Build predictive model using EazyML API
build_model_response = ez.ez_build_model(train_df, outcome=outcome, options=model_options)

4.2 Feature Importance¶

In [6]:
ez.ez_display_df(build_model_response['global_importance'])
  Variable Name Importance
0 mobile_wt 0.010000
1 int_memory 0.010000
2 talk_time 0.010000
3 sc_w 0.010000
4 px_width 0.080000
5 px_height 0.090000
6 battery_power 0.130000
7 ram 0.610000

4.3 Model Importance¶

In [7]:
ez.ez_display_df(build_model_response['model_performance'])
  Model Kappa Accuracy
0 Logistic Regression 0.95 0.97
1 Gradient Boosting Classifier 0.89 0.91
2 Boosted Decision Trees with InformationGain 0.86 0.89
3 Bagged Decision Trees with Information Gain 0.84 0.88
4 Naive Bayes 0.75 0.81
5 Random Forest with Information Gain 0.61 0.71

4.4 Predict Using the Trained EazyML Model¶

In [8]:
# Extract model information from the response dictionary
model_info = build_model_response["model_info"]

# Read test data from a CSV file into a pandas DataFrame
test_data = pd.read_excel(test_file)

# Make predictions using the model, requesting confidence scores and class probabilities
predicted_resp = ez.ez_predict(test_data, model_info, options={"confidence_score": True, "class_probability": True})

# Check if the prediction was successful
if predicted_resp['success']:
    print("Prediction successful")  
    predicted_df = predicted_resp['pred_df']  # Extract the predicted DataFrame
    ez.ez_display_df(predicted_df.head())  # Display the first few rows of the predicted DataFrame
else:
    print("Prediction failed")  
    print(predicted_resp['message'])  
Prediction successful
  battery_power blue clock_speed dual_sim fc four_g int_memory m_dep mobile_wt n_cores pc px_height px_width ram sc_h sc_w talk_time three_g touch_screen wifi price_range Probability_price_range_0 Probability_price_range_1 Probability_price_range_2 Probability_price_range_3 Predicted price_range Confidence Score
0 1646 0 2.500000 0 3 1 25 0.600000 200 2 5 211 1608 686 8 6 11 1 1 0 0 0.919936 0.080052 0.000013 0.000000 0 91%
1 1182 0 0.500000 0 7 1 8 0.500000 138 8 16 275 986 2563 19 17 19 1 0 0 2 0.000058 0.122860 0.870659 0.006424 2 87%
2 1972 0 2.900000 0 9 0 14 0.400000 196 7 18 293 952 1316 8 1 8 1 1 0 1 0.146630 0.839048 0.014322 0.000000 1 83%
3 989 1 2.000000 0 4 0 17 0.200000 166 3 19 256 1394 3892 18 7 19 1 1 0 3 0.000000 0.000001 0.012143 0.987857 3 98%
4 615 1 0.500000 1 7 0 58 0.500000 130 5 8 1021 1958 1906 14 5 5 1 0 0 1 0.002843 0.753895 0.243139 0.000123 1 75%

5. EazyML Counterfactual Inference¶

5.1 Define Counterfactual Inference Configuration¶

In [9]:
# Define the selected features for prediction
selected_features = ['sc_w', 'n_cores', 'mobile_wt', 'talk_time', 'ram', 'px_width', 'px_height', 
                     'battery_power', 'pc', 'fc', 'm_dep', 'int_memory', 'sc_h']

# Define variant (modifiable) features
invariants = []
variants = [feature for feature in selected_features if feature not in invariants]

# Define configurable parameters for counterfactual inference
cf_options = {   
    "variants": variants,  
    "outcome_ordinality": "1",  # Desired outcome 
    "train_data": train_file  
}

5.2 Perform Counterfactual Inference¶

In [10]:
# Specify the index of the test record for counterfactual inference
test_index_no = 0  
test_data = predicted_df.loc[[test_index_no]]  

# Perform Inference 
result, optimal_transition_df = ez_cf_inference(
    test_data=test_data,  
    outcome=outcome,  
    selected_features=selected_features,  
    model_info=model_info,
    options=cf_options  
)

5.3 Display Results¶

In [11]:
# Summarizes whether an optimal transition was found and the improvement in outcome probability.
ez.ez_display_json(result)
{   'success': True,
    'message': 'Optimal transition found',
    'summary': {   'Actual Outcome': '0',
                   'Optimal Outcome': '1',
                   'Improvement in Probability': 0.635}}
In [12]:
# Details the feature changes needed to achieve the optimal outcome.
ez.ez_display_df(optimal_transition_df)
  Feature Actual Optimal Percentage Change Absolute Change
0 sc_w 6.000000 4.800000 -20.000000 -1.200000
1 n_cores 2.000000 3.000000 50.000000 1.000000
2 mobile_wt 200.000000 176.000000 -12.000000 -24.000000
3 talk_time 11.000000 13.200000 20.000000 2.200000
4 ram 686.000000 823.200000 20.000000 137.200000
5 px_width 1608.000000 1906.880000 18.600000 298.880000
6 px_height 211.000000 253.200000 20.000000 42.200000
7 battery_power 1646.000000 1944.880000 18.200000 298.880000
8 pc 5.000000 4.000000 -20.000000 -1.000000
9 fc 3.000000 2.400000 -20.000000 -0.600000
10 m_dep 0.600000 1.000000 66.700000 0.400000
11 int_memory 25.000000 30.000000 20.000000 5.000000
12 sc_h 8.000000 9.000000 12.500000 1.000000
In [ ]: