A Guide to Scikit-Learn's `TransformedTargetRegressor`

In the machine learning domain, preprocessing and transformation of data are commonplace steps undertaken to ensure model efficiency and accuracy. Often overlooked, though, is how targets or labels can also benefit from transformation to help refine prediction tasks, especially for regression problems. In Scikit-Learn, a versatile package that has solutions for various machine learning tasks, the TransformedTargetRegressor utility plays a key role in making these transformations seamless. This guide walks you through how to use this powerful tool efficiently.

Introducing TransformedTargetRegressor
1. Why Use TransformedTargetRegressor?
Getting Started: Basic Usage
Advanced Transformations
Custom Transformations
Conclusion

Introducing `TransformedTargetRegressor`

Scikit-Learn's TransformedTargetRegressor facilitates transforming target variables in regression tasks. By enabling target transformation and inverse transformations during prediction, it helps improve the stability and performance of predictive models that struggle with, for instance, skewed target variables.

Why Use `TransformedTargetRegressor`?

Better Model Performance: Transforming skewed target data can align statistical assumptions necessary for specific regression models, thus improving their predictability.
Simplifies Transformations: Automatically manages transformations and inverse transformations during training and predictions, reducing boilerplate code and potential errors.

Getting Started: Basic Usage

Before proceeding, ensure you have scikit-learn installed:

pip install scikit-learn

Here's a quick example to illustrate the basic usage of TransformedTargetRegressor:

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.compose import TransformedTargetRegressor
from sklearn.preprocessing import FunctionTransformer
import numpy as np

# Using the boston dataset
X, y = load_boston(return_X_y=True)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# Define an example transformer and regressor
transformer = FunctionTransformer(np.log1p, inverse_func=np.expm1)

# Initialize a Linear Regression model
regressor = LinearRegression()

# Wrap the regressor with a transformed target
model = TransformedTargetRegressor(regressor=regressor, transformer=transformer)

# Fit on the training data
model.fit(X_train, y_train)

# Score on the test data
print("Model R^2 on test: ", model.score(X_test, y_test))

In this example:

We apply a log transformation to the target using FunctionTransformer to handle any skewness present.
We define a basic LinearRegression model that aims to perform linear regression after the target transformation.
TransformedTargetRegressor wraps the linear regression model, applying transformations during fitting and predictions.

Advanced Transformations

Sometimes, more complex transformations might be needed. You may choose from any available transformers or build custom ones that suit your data or specific requirements.

For instance, if your target distribution indicates that a quantile transformation can be beneficial:

from sklearn.preprocessing import QuantileTransformer

# Define a quantile transformer
transformer = QuantileTransformer(output_distribution='normal')

# Create the TransformedTargetRegressor
model = TransformedTargetRegressor(regressor=regressor, transformer=transformer)

# Fit and predict as usual
model.fit(X_train, y_train)
print("Model R^2 with Quantile Transformer on test: ", model.score(X_test, y_test))

Custom Transformations

Custom transformers can be defined by writing functions or classes to represent the transformation logic. The key requirements are that these transformations must include both forward and inverse operations, the latter primarily being applied during the prediction phase.

To underline integrating such transformations, consider needing a function to raise targets to the power of a given constant:

def power_transform(y, power=0.5):
    return np.power(y, power)

def inverse_power_transform(y, power=0.5):
    return np.power(y, 1/power)

# Using the power function in the FunctionTransformer
tf = FunctionTransformer(lambda x: power_transform(x, 0.5), 
                         inverse_func=lambda x: inverse_power_transform(x, 0.5))

# Model with custom transformer
model_custom = TransformedTargetRegressor(regressor=regressor, transformer=tf)
model_custom.fit(X_train, y_train)
print("Model R^2 with Custom Power Transformer on test: ", model_custom.score(X_test, y_test))

This flexible and straightforward way of defining transformations makes TransformedTargetRegressor reusable for various adjustments in many regression tasks.

Conclusion

The TransformedTargetRegressor in scikit-learn is a powerful tool for improving regression models by applying transformations to target variables. Not only does it manage the application and inversion of these changes, but it also simplifies aspect handling like data preprocessing, enabling more focused model tuning and experimenting. Whether using pre-built transformers or custom-defined ones, it significantly adds to any machine learning engineer’s toolkit.

Next Article: How to Use `make_column_transformer` in Scikit-Learn

Previous Article: An Introduction to Scikit-Learn's `ColumnTransformer`

Series: Scikit-Learn Tutorials

Scikit-Learn