Lift Test Calibration#

Introduction#

You may have heard of the phrase “all models are wrong; some models are useful.” This is true in many areas, and it’s likely that after the first attempt, you haven’t yet created a model that can accurately determine your true attribution. Even if you have, how can you be sure? We create models to understand the attribution of our marketing channels, but it appears that even then, we can’t always rely on what the models tell us.

In order to ensure that our models are decomposing correctly, we can use various testing methods to gather real-world data and compare it with our models. This will help us to identify any discrepancies and improve the decomposition accuracy of our models.

Today, we will explore a new way to integrate experiments into pymc-marketing. This will bring us closer to accurate representations of real-world values and improve the estimates generated by our models.

Requirements#

Today, we won’t be discussing how to conduct various tests, but instead, we will focus on their utilization. If you wish to acquire knowledge on how to generate results that are compatible with your MMM models, you can check out CausalPy for conducting experiments.

Goal#

After reading this notebook, you will have gained the necessary expertise to incorporate the results (detected uplift from your experiments) into our regressive model.

This notebook will display using the add_lift_test_measurements method of DelayedSaturatedMMM and its workflow:

Build model: mmm.build_model(X, y)
Add lift measurements: mmm.add_lift_test_measurements(df_lift_test)
Sample posterior: mmm.fit(X, y)

This is a case study of two correlated channels to see how lift tests help distinguish the channel effects.

from functools import partial

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm

from pymc_marketing.mmm import DelayedSaturatedMMM
from pymc_marketing.mmm.transformers import logistic_saturation

az.style.use("arviz-darkgrid")
plt.rcParams["figure.figsize"] = [12, 7]
plt.rcParams["figure.dpi"] = 100

%config InlineBacken.figure_format = "retina"

Generate Correlated Spends and Model Target#

Spend data for two channels with completely correlated spends

seed = sum(map(ord, "Lift tests help distinguish channel effects"))
rng = np.random.default_rng(seed)

df = pd.DataFrame(
    {
        "date": dates,
        "channel 1": spend,
        "channel 2": spend,
    }
)

ax = df.set_index("date").plot(ylabel="channel spend");

../../_images/04dff75ec5b13165c6ac594e555ecf6ed5d61ea9eec8270fe3642bf7a9ebcb93.png

Perfectly correlated data

df.filter(regex="channel").corr()

	channel 1	channel 2
channel 1	1.0	1.0
channel 2	1.0	1.0

We use the DelayedSaturatedMMM class to specify our model just as usual.

mmm = DelayedSaturatedMMM(
    date_column="date",
    channel_columns=["channel 1", "channel 2"],
    adstock_max_lag=6,
)

For this constructed example, we will set parameter of the model with the pm.do operator and take a random sample of the target variable. The fixed parameters are below which we will try to recover.

At this point, a model has not been fit. However, we have created our data set to fit our model on.

Note: The max value of each spend column was set to 1 in order to account for model prepreprocessing

true_lam_c1 = 10
true_beta_c1 = 0.55

true_lam_c2 = 1.5
true_beta_c2 = 1.0

true_params = {
    "alpha": [0.5, 0.5],
    "lam": [true_lam_c1, true_lam_c2],
    "beta_channel": [true_beta_c1, true_beta_c2],
    "intercept": 0.5,
    "likelihood_sigma": 0.25,
}

ax = df.set_index("date").plot(ylabel="", title="Channel spends and target variable")
ax.legend(bbox_to_anchor=(1.0, 1.05), title="column");

../../_images/c3393b1580611a549017f36eb20f135b4373221f44e07161e1f7c41dfa510dc0.png

X = df.reset_index().drop("y", axis=1)
y = df["y"]

Indistinguishable Parameter Estimates#

In order to show the trouble that completely correlated channels provides, let’s fit the model

fit_kwargs = {"nuts_sampler": "numpyro", "random_seed": rng}

idata_without = mmm.fit(X, y, **fit_kwargs)

Compiling...
An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.
Compilation time = 0:00:04.523585
Sampling...

Sampling time = 0:00:09.100238
Transforming variables...
Transformation time = 0:00:01.207358

Since the spends are completely correlated, there is no way to distinguish the parameters. Not only that, but the parameter estimates are not close to actuals.

ax = az.plot_forest(idata_without, var_names=["lam"], combined=True)[0]

plot_true_value(true_lam_c1, "channel 1", ax=ax, split=0.4)
plot_true_value(true_lam_c2, "channel 2", ax=ax, split=0.4);

../../_images/efda0d9ce16f09a481949cb867ae5bb648bf4a84c5dffef70a688b8c2faf2e27.png

ax = az.plot_forest(idata_without, var_names=["beta_channel"], combined=True)[0]

plot_true_value(true_beta_c1, "channel 1", ax=ax, split=0.4)
plot_true_value(true_beta_c2, "channel 2", ax=ax, split=0.4);

../../_images/f25b6fec62612dc2985003c5491a51dcdb60ac185afc49dc040ec63d1300cc55.png

And that the direct response curve created from these parameters are completely overlapping.

mmm.plot_direct_contribution_curves(same_axes=True);

../../_images/30dd9c1d463a9416ab312ecad548221710d36a05d05eeb19c22fa2bb87dcd4bc.png

Unobserved Saturation Curves#

These are the true saturation curve from our model internals. We don’t observe them.

There is a difference between channels. However, it is not possible to tell with just the MMM alone.

def saturation_function(x, lam, beta):
    return (beta * logistic_saturation(x, lam)).eval()


step_size = 0.05
xx = np.arange(0, spend.max() * 1.1, step_size)

c1_curve_fn = partial(saturation_function, lam=true_lam_c1, beta=true_beta_c1)
c2_curve_fn = partial(saturation_function, lam=true_lam_c2, beta=true_beta_c2)

c1_curve = c1_curve_fn(xx)
c2_curve = c2_curve_fn(xx)


def plot_actual_curves(ax: plt.Axes, linestyle: str | None = None) -> plt.Axes:
    ax.plot(xx, c1_curve, label="channel 1", color="C0", linestyle=linestyle)
    ax.plot(xx, c2_curve, label="channel 2", color="C1", linestyle=linestyle)

    return ax


def plot_reference(ax: plt.Axes) -> plt.Axes:
    ax.plot(xx, xx, label="y=x", color="black", linestyle="--")

    return ax


ax = plt.gca()
plot_actual_curves(ax)
plot_reference(ax)
ax.set(
    xlabel="channel spend",
    ylabel="channel contribution",
    title="Actual Saturation Curves (Unobserved)",
)
ax.legend();

../../_images/c2c8bb3099f67e776e0ecd4e8892decaaa3eaa1bf4dc78dcad226a539781eafc.png

About Lift Tests#

In a lift study, one temporarily changes the budget of a channel for a fixed period of time, and then uses some method (for example CausalPy) to make inference about the change in sales directly caused by the adjustment.

A lift test is characterized by:

channel: the channel that was tested
x: pre-test channel spend
delta_x: change made to x
delta_y: inferred change in sales due to delta_x
sigma: standard deviation of delta_y

An experiment characterized in this way can be viewed as two points on the saturation curve for the channel. Accordingly, lift test calibration is implemented by adding a term to the model likelihood, that makes the channel saturation curve (the contribution as a function of spend) align with the test observation.

def create_lift_test_from_actual_curve(
    channel: str, x: float, delta_x: float, sigma: float
) -> dict[str, float]:
    curve_fn = c1_curve_fn if channel == "channel 1" else c2_curve_fn

    delta_y = curve_fn(x + delta_x) - curve_fn(x)

    return {
        "channel": channel,
        "x": x,
        "delta_x": delta_x,
        "delta_y": delta_y,
        "sigma": sigma,
    }

df_lift_test = pd.DataFrame(
    [
        # Channel x1
        create_lift_test_from_actual_curve("channel 1", 0.0, 0.05, 0.05),
        create_lift_test_from_actual_curve("channel 1", 0.15, 0.05, 0.05),
        create_lift_test_from_actual_curve("channel 1", 0.3, 0.05, 0.05),
        # Channel x2
        create_lift_test_from_actual_curve("channel 2", 0.5, 0.05, 0.10),
    ]
)

df_lift_test

	channel	x	delta_x	delta_y	sigma
0	channel 1	0.00	0.05	0.134705	0.05
1	channel 1	0.15	0.05	0.069545	0.05
2	channel 1	0.30	0.05	0.019925	0.05
3	channel 2	0.50	0.05	0.032236	0.10

Below we are visualizing each lift test observation as a triangle. The base of the triangle shows the change in the spend and height of triangle is change in contribution. Each base is at y=0 since only the change in contribution is observed.

Even though these are not the saturation curve, they can be used to better estimate the saturation curve. We can see that channel 2 is slower to saturation since the change in y is larger for higher x values.

ax = plt.gca()
plot_lift_test_triangles(df_lift_test, ax=ax)
ax.legend()
ax.set(
    xlim=(-0.01, 1.01),
    xlabel="x",
    ylabel="change in channel contribution",
    title="Lift tests show different diminishing returns",
);

../../_images/f0cec8718d1921a36e43704770c203fbc1fb523d5b463b2c173041ed075dad7c.png

Add Lift Tests to Model#

Having created a DelayedSaturatedMMM model instance, mmm and built it using the build_model method or fit with fit method, we can add lift test results to the model using the add_lift_test_measurments method.

mmm.add_lift_test_measurements(df_lift_test)

We can see the model graph is modified with new observation for our lift measurements.

The observation distribution is assumed to be Gamma as each saturation curve is monotonically increasing given a set of parameters.

pm.model_to_graphviz(mmm.model)

../../_images/9d5a5ba1e99ba54b118e23b55a7a06f275360fb966fbdb5f80bd8c873b9ec135.svg

We can refit the model but with the lift tests included

idata_with = mmm.fit(X, y, **fit_kwargs)

Compiling...
Compilation time = 0:00:05.530146
Sampling...

Sampling time = 0:00:08.329839
Transforming variables...
Transformation time = 0:00:00.701632

The model gets shaped by the lift test measurements and the response curves begin to separate!

fig = mmm.plot_direct_contribution_curves(same_axes=True)
ax = fig.axes[0]
plot_actual_curves(ax=ax, linestyle="dashed")
plot_lift_test_rug(df_lift_test, ax);

../../_images/310481e7918f272f5e656466f238efe588bee0a2e3c8012868d2c1567919ced3.png

This can be seen in the parameter estimates that make up this response curve

data = [idata_without, idata_with]
model_names = ["without lift tests", "with lift tests"]

ax = plot_comparison(data, model_names, "lam")
plot_true_value(true_lam_c1, "channel 1", ax)
plot_true_value(true_lam_c2, "channel 2", ax);

../../_images/0eb7128cd28e3d8aa51528429f01b26f166ca50848b10a20acb0e6ac83b0e759.png

ax = plot_comparison(data, model_names, "beta_channel")
plot_true_value(true_beta_c1, "channel 1", ax)
plot_true_value(true_beta_c2, "channel 2", ax);

../../_images/4f7109bd9ee4edae62ceab7816bcb88c685818044cd062e42a09d8ec608e3aab.png

Add Additional Lift Tests#

We can add even more lift tests.

They can either all be added at one time or separately. Use the name parameter in order to separate the two sets of observations in the model graph.

df_additional_lift_test = pd.DataFrame(
    [
        # More for Channel x1
        create_lift_test_from_actual_curve("channel 1", 0.1, 0.05, sigma=0.01),
        create_lift_test_from_actual_curve("channel 1", 0.5, 0.05, sigma=0.01),
        # More for channel x2
        create_lift_test_from_actual_curve("channel 2", 0.3, 0.05, sigma=0.01),
    ]
)

df_additional_lift_test

	channel	x	delta_x	delta_y	sigma
0	channel 1	0.1	0.05	0.095167	0.01
1	channel 1	0.5	0.05	0.002885	0.01
2	channel 2	0.3	0.05	0.035354	0.01

mmm.add_lift_test_measurements(df_additional_lift_test, name="more_lift_measurements")

pm.model_to_graphviz(mmm.model)

../../_images/15870f4d0c67bdc6f818d2ef5d5cd3e492e33ae063710551d20dd0200afd5a75.svg

idata_with_more = mmm.fit(X, y, **fit_kwargs)

Compiling...
Compilation time = 0:00:06.384169
Sampling...

Sampling time = 0:00:09.624948
Transforming variables...
Transformation time = 0:00:00.746753

The response curve is shifting more and more

df_all_lift_tests = pd.concat([df_lift_test, df_additional_lift_test])

fig = mmm.plot_direct_contribution_curves(same_axes=True)
ax = fig.axes[0]
plot_actual_curves(ax=ax, linestyle="dashed")
plot_lift_test_rug(df_all_lift_tests, ax);

../../_images/6bb6485c17a0f62de4144318f913ab30cd15a0e6bddf42493d6b56edcc449d12.png

data = [idata_without, idata_with, idata_with_more]
model_names = ["without lift tests", "with lift tests", "with more lift tests"]

ax = plot_comparison(data, model_names, "lam")
plot_true_value(true_lam_c1, "channel 1", ax, split=0.435)
plot_true_value(true_lam_c2, "channel 2", ax, split=0.435);

../../_images/7ca8930800f114db59d94a8cc3169e405346c74af146c0366403891b9055d163.png

ax = plot_comparison(data, model_names, "beta_channel")
plot_true_value(true_beta_c1, "channel 1", ax, split=0.435)
plot_true_value(true_beta_c2, "channel 2", ax, split=0.435);

../../_images/0432d07d11f61735b2f68e7852f87752b30e5d92891a5ec05280fc2c905a3efa.png

Conclusion#

The add_lift_test_measurements method can be used in order to incorporate experiments into our model likelihood and nudge the model parameters closer to the actuals in this example.

Conducting various experiments for each channel at various spends will bring the best results.