MMM End-to-End Case Study#
In today’s competitive business landscape, optimizing marketing spend across channels is crucial for maximizing return on investment and driving business growth. This notebook demonstrates how to leverage PyMC-Marketing’s Media Mix Modeling (MMM) to gain actionable insights into marketing effectiveness and optimize budget allocation.
We’ll walk through a real-world case study using data from a PyData Global 2022 presentation (Hajime Takeda - Media Mix Modeling:How to Measure the Effectiveness of Advertising). Our goal is to show how MMM can help marketing teams:
Quantify the impact of different marketing channels on sales
Understand saturation effects and diminishing returns
Identify opportunities to reallocate budget for improved performance
Make data-driven decisions about future marketing investments
Outline
Data Preparation and Exploratory Analysis
Model Specification and Fitting
Model Diagnostics and Validation
Marketing Channel Deep Dive
Channel Contributions
Saturation Curves
Return on Ad Spend (ROAS)
Out-of-Sample Predictions
Budget Optimization
Key Takeaways and Next Steps
Note
In this notebook we guide you through what a typical MMM first iteration looks like (note that what a first iteration means depends on the business context). We focus on the application of the model for a concrete business case. If you want to know more about the model specification please refer to the MMM Example Notebook notebook.
Prepare Notebook#
We load the necessary libraries and set the notebook’s configuration.
import warnings
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from pymc_marketing.metrics import crps
from pymc_marketing.mmm import MMM, GeometricAdstock, LogisticSaturation
from pymc_marketing.mmm.utils import apply_sklearn_transformer_across_dim
from pymc_marketing.prior import Prior
warnings.filterwarnings("ignore", category=FutureWarning)
az.style.use("arviz-darkgrid")
plt.rcParams["figure.figsize"] = [12, 7]
plt.rcParams["figure.dpi"] = 100
%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = "retina"
seed: int = sum(map(ord, "mmm"))
rng: np.random.Generator = np.random.default_rng(seed=seed)
Load Data#
We load the data from a CSV file as a pandas DataFrame. We also do some light data cleaning following the filtering of the original case study. In essence, we are filtering and renaming certain marketing channels.
raw_df = pd.read_csv("https://raw.githubusercontent.com/sibylhe/mmm_stan/main/data.csv")
# 1. control variables
# We just keep the holidays columns
control_columns = [col for col in raw_df.columns if "hldy_" in col]
# 2. media variables
channel_columns_raw = sorted(
[
col
for col in raw_df.columns
if "mdsp_" in col
and col != "mdsp_viddig"
and col != "mdsp_auddig"
and col != "mdsp_sem"
]
)
channel_mapping = {
"mdsp_dm": "Direct Mail",
"mdsp_inst": "Insert",
"mdsp_nsp": "Newspaper",
"mdsp_audtr": "Radio",
"mdsp_vidtr": "TV",
"mdsp_so": "Social Media",
"mdsp_on": "Online Display",
}
channel_columns = sorted(list(channel_mapping.values()))
# 3. sales variables
sales_col = "sales"
data_df = raw_df[["wk_strt_dt", sales_col, *channel_columns_raw, *control_columns]]
data_df = data_df.rename(columns=channel_mapping)
# 4. Date column
data_df["wk_strt_dt"] = pd.to_datetime(data_df["wk_strt_dt"])
date_column = "wk_strt_dt"
Exploratory Data Analysis#
We should always start looking at the data! We can start by plotting the target variable, i.e. the sales.
fig, ax = plt.subplots()
sns.lineplot(data=data_df, x=date_column, y="sales", color="black", ax=ax)
ax.set(title="Sales", xlabel="date", ylabel="sales");

This time series, which comes in weekly granularity, has a clear yearly seasonality pattern and apparently a mild downward trend.
Next, let’s look into the marketing channels spend data. We first look into the share to get a feeling on the relative spend across channels.
fig, ax = plt.subplots()
data_df.melt(
value_vars=channel_columns, var_name="channel", value_name="spend"
).groupby("channel").agg({"spend": "sum"}).sort_values(by="spend").plot.barh(ax=ax)
ax.set(title="Total Media Spend", xlabel="Spend", ylabel="Channel");

In this specific example we see that Direct Mail
is the channel with the highest spend, followed by Newspaper
and Online Display
.
Next, we look into the time series of each of the marketing channels.
fig, ax = plt.subplots()
data_df.set_index("wk_strt_dt")[channel_columns].plot(ax=ax)
ax.legend(title="Channel", fontsize=12)
ax.set(title="Media Spend", xlabel="Date", ylabel="Spend");

We see an overall decerease spend over time. This might explain the downward trend in the sales time series.
Next, we can compute the correlation between the marketing channels and the sales.
Note
We all know that correlation does not imply causation. However, it is a good idea to look into the correlation between the marketing channels and the target variable. This can hel us spot issues in the data (very commom in real cases) and inspire some feature engineering.
n_channels = len(channel_columns)
fig, axes = plt.subplots(
nrows=n_channels,
ncols=1,
figsize=(15, 3 * n_channels),
sharex=True,
sharey=False,
layout="constrained",
)
for i, channel in enumerate(channel_columns):
ax = axes[i]
ax_twin = ax.twinx()
sns.lineplot(data=data_df, x=date_column, y=channel, color=f"C{i}", ax=ax)
sns.lineplot(data=data_df, x=date_column, y=sales_col, color="black", ax=ax_twin)
correlation = data_df[[channel, sales_col]].corr().iloc[0, 1]
ax_twin.grid(None)
ax.set(title=f"{channel} (Correlation: {correlation:.2f})")
ax.set_xlabel("date");

The channels with the highest correlation with the sales are TV
, Insert
, and Online Display
. Observe that the biggest channel in spend, Direct Mail
, has the lowest correlation.
Train Test Split#
We are now ready to fit the model. We start by splitting the data into training and testing.
Note
In a real case scenario you should use more data for training than for testing (see Time-Slice-Cross-Validation and Parameter Stability). In this case we are using a small dataset for demonstrative purposes.
train_test_split_date = pd.to_datetime("2018-02-01")
train_mask = data_df.wk_strt_dt <= train_test_split_date
test_mask = data_df.wk_strt_dt > train_test_split_date
train_df = data_df[train_mask]
test_df = data_df[test_mask]
fig, ax = plt.subplots()
sns.lineplot(data=train_df, x="wk_strt_dt", y="sales", color="C0", label="Train", ax=ax)
sns.lineplot(data=test_df, x="wk_strt_dt", y="sales", color="C1", label="Test", ax=ax)
ax.set(title="Sales - Train Test Split", xlabel="date");

X_train = train_df.drop(columns=sales_col)
X_test = test_df.drop(columns=sales_col)
y_train = train_df[sales_col]
y_test = test_df[sales_col]
Model Specification#
Assuming we are working on the first iteration, we typically would not have lift tests data to calibrate the model. This is fine, we can start with a spends priors to get started.
Tip
Ideally, in the next iteration we should have some of these lift tests to calibrate the model. For more details see Lift Test Calibration and the Case Study: Unobserved Confounders, ROAS and Lift Tests case study notebook.
The idea behind the spends priors is that we use the spend shares as a proxy for the effect of the media channels on the sales. This is not perfect but can be a good starting point. In essence, we are giving the higher spend channels more flexibility to fit the data with a wider prior. Let’s see how this looks like. First, we compute the spend shares (on the training data!).
spend_shares = (
X_train.melt(value_vars=channel_columns, var_name="channel", value_name="spend")
.groupby("channel", as_index=False)
.agg({"spend": "sum"})
.sort_values(by="channel")
.assign(spend_share=lambda x: x["spend"] / x["spend"].sum())["spend_share"]
.to_numpy()
)
prior_sigma = spend_shares
Next, we use these spend shares to set the prior of the model using the model configuration.
model_config = {
"intercept": Prior("Normal", mu=0.2, sigma=0.05),
"saturation_beta": Prior("HalfNormal", sigma=prior_sigma, dims="channel"),
"saturation_lam": Prior("Gamma", alpha=3, beta=1, dims="channel"),
"gamma_control": Prior("Normal", mu=0, sigma=1),
"gamma_fourier": Prior("Laplace", mu=0, b=1),
"likelihood": Prior("TruncatedNormal", lower=0, sigma=Prior("HalfNormal", sigma=1)),
}
We are now ready to specify the model (see MMM Example Notebook for more details). As the yearly seasonal component is not that strong, we use few Fourier terms.
sampler_config = {"progressbar": True}
mmm = MMM(
model_config=model_config,
sampler_config=sampler_config,
date_column=date_column,
adstock=GeometricAdstock(l_max=6),
saturation=LogisticSaturation(),
channel_columns=channel_columns,
control_columns=control_columns,
yearly_seasonality=5,
)
Prior Predictive Checks#
We can now generate prior predictive samples to see how the model behaves under the prior specification.
# Generate prior predictive samples
mmm.sample_prior_predictive(X_train, y_train, samples=4_000)
fig, ax = plt.subplots()
fig = mmm.plot_prior_predictive(original_scale=True, ax=ax)
ax.legend(loc="lower center", bbox_to_anchor=(0.5, -0.2), ncol=4)
ax.set(xlabel="date", ylabel="sales");
Sampling: [adstock_alpha, gamma_control, gamma_fourier, intercept, saturation_beta, saturation_lam, y, y_sigma]

Overall, the priors are not very informative and the ranges is approximately one order of magnitude of the observed data. Note that the choice of likelihood as a truncated normal is to avoid negative values for the target variable.
Let’s look into the channel contribution share before fitting the model.
fig = mmm.plot_channel_contribution_share_hdi(prior=True, figsize=(10, 6))
fig.suptitle("Channel Contribution Share", fontsize=18, fontweight="bold");

We confirm these represent the spend shares as expected from the prior specification.
Model Fitting#
We now fit the model to the training data.
try:
import numpyro # noqa: F401
nuts_sampler = "numpyro"
except ModuleNotFoundError:
nuts_sampler = "pymc"
mmm.fit(
X=X_train,
y=y_train,
target_accept=0.9,
chains=4,
draws=4_000,
nuts_sampler=nuts_sampler,
random_seed=rng,
)
-
<xarray.Dataset> Size: 968MB Dimensions: (chain: 4, draw: 4000, channel: 7, date: 183, control: 22, fourier_mode: 10) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 32kB 0 1 2 ... 3997 3998 3999 * channel (channel) <U14 392B 'Direct Mail' ... 'TV' * date (date) datetime64[ns] 1kB 2014-08-03 ...... * control (control) <U24 2kB 'hldy_Black Friday' .... * fourier_mode (fourier_mode) <U5 200B 'sin_1' ... 'cos_5' Data variables: (12/13) adstock_alpha (chain, draw, channel) float64 896kB 0.2... channel_contributions (chain, draw, date, channel) float64 164MB ... control_contributions (chain, draw, date, control) float64 515MB ... fourier_contributions (chain, draw, date, fourier_mode) float64 234MB ... gamma_control (chain, draw, control) float64 3MB 1.315... gamma_fourier (chain, draw, fourier_mode) float64 1MB ... ... ... mu (chain, draw, date) float64 23MB 0.1949 ... saturation_beta (chain, draw, channel) float64 896kB 0.0... saturation_lam (chain, draw, channel) float64 896kB 2.3... total_contributions (chain, draw) float64 128kB 31.2 ... 29.56 y_sigma (chain, draw) float64 128kB 0.1111 ... 0... yearly_seasonality_contribution (chain, draw, date) float64 23MB -0.0398... Attributes: created_at: 2025-01-26T11:12:47.201838+00:00 arviz_version: 0.20.0 inference_library: numpyro inference_library_version: 0.16.1 sampling_time: 25.496079 tuning_steps: 1000
-
<xarray.Dataset> Size: 816kB Dimensions: (chain: 4, draw: 4000) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 32kB 0 1 2 3 4 5 ... 3995 3996 3997 3998 3999 Data variables: acceptance_rate (chain, draw) float64 128kB 0.9677 0.9754 ... 0.8467 0.9646 diverging (chain, draw) bool 16kB False False False ... False False energy (chain, draw) float64 128kB -75.03 -84.74 ... -65.96 -78.31 lp (chain, draw) float64 128kB -108.6 -105.7 ... -93.79 -106.8 n_steps (chain, draw) int64 128kB 127 127 127 127 ... 127 127 63 step_size (chain, draw) float64 128kB 0.03399 0.03399 ... 0.03713 tree_depth (chain, draw) int64 128kB 7 7 7 7 7 7 7 7 ... 7 7 7 7 7 7 6 Attributes: created_at: 2025-01-26T11:12:47.217039+00:00 arviz_version: 0.20.0
-
<xarray.Dataset> Size: 242MB Dimensions: (chain: 1, draw: 4000, channel: 7, date: 183, control: 22, fourier_mode: 10) Coordinates: * chain (chain) int64 8B 0 * draw (draw) int64 32kB 0 1 2 ... 3997 3998 3999 * channel (channel) <U14 392B 'Direct Mail' ... 'TV' * date (date) datetime64[ns] 1kB 2014-08-03 ...... * control (control) <U24 2kB 'hldy_Black Friday' .... * fourier_mode (fourier_mode) <U5 200B 'sin_1' ... 'cos_5' Data variables: (12/13) adstock_alpha (chain, draw, channel) float64 224kB 0.0... channel_contributions (chain, draw, date, channel) float64 41MB ... control_contributions (chain, draw, date, control) float64 129MB ... fourier_contributions (chain, draw, date, fourier_mode) float64 59MB ... gamma_control (chain, draw, control) float64 704kB 1.0... gamma_fourier (chain, draw, fourier_mode) float64 320kB ... ... ... mu (chain, draw, date) float64 6MB 2.534 ..... saturation_beta (chain, draw, channel) float64 224kB 0.8... saturation_lam (chain, draw, channel) float64 224kB 1.0... total_contributions (chain, draw) float64 32kB 49.11 ... 49.51 y_sigma (chain, draw) float64 32kB 1.14 ... 1.5 yearly_seasonality_contribution (chain, draw, date) float64 6MB 2.187 ..... Attributes: created_at: 2025-01-26T11:12:18.650352+00:00 arviz_version: 0.20.0 inference_library: pymc inference_library_version: 5.20.0
-
<xarray.Dataset> Size: 6MB Dimensions: (chain: 1, draw: 4000, date: 183) Coordinates: * chain (chain) int64 8B 0 * draw (draw) int64 32kB 0 1 2 3 4 5 6 ... 3994 3995 3996 3997 3998 3999 * date (date) datetime64[ns] 1kB 2014-08-03 2014-08-10 ... 2018-01-28 Data variables: y (chain, draw, date) float64 6MB 0.772 3.712 0.9877 ... 0.04658 1.95 Attributes: created_at: 2025-01-26T11:12:18.654886+00:00 arviz_version: 0.20.0 inference_library: pymc inference_library_version: 5.20.0
-
<xarray.Dataset> Size: 3kB Dimensions: (date: 183) Coordinates: * date (date) datetime64[ns] 1kB 2014-08-03 2014-08-10 ... 2018-01-28 Data variables: y (date) float64 1kB 0.2049 0.2241 0.1993 ... 0.142 0.1289 0.1469 Attributes: created_at: 2025-01-26T11:12:47.218112+00:00 arviz_version: 0.20.0 inference_library: numpyro inference_library_version: 0.16.1 sampling_time: 25.496079 tuning_steps: 1000
-
<xarray.Dataset> Size: 47kB Dimensions: (date: 183, channel: 7, control: 22) Coordinates: * date (date) datetime64[ns] 1kB 2014-08-03 2014-08-10 ... 2018-01-28 * channel (channel) <U14 392B 'Direct Mail' 'Insert' ... 'TV' * control (control) <U24 2kB 'hldy_Black Friday' ... 'hldy_Veterans Day' Data variables: channel_data (date, channel) float64 10kB 0.2815 0.2199 ... 0.1188 0.03964 control_data (date, control) float64 32kB 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 dayofyear (date) int32 732B 215 222 229 236 243 250 ... 365 7 14 21 28 Attributes: created_at: 2025-01-26T11:12:47.220089+00:00 arviz_version: 0.20.0 inference_library: numpyro inference_library_version: 0.16.1 sampling_time: 25.496079 tuning_steps: 1000
-
<xarray.Dataset> Size: 47kB Dimensions: (index: 183) Coordinates: * index (index) int64 1kB 0 1 2 3 4 ... 179 180 181 182 Data variables: (12/31) wk_strt_dt (index) datetime64[ns] 1kB 2014-08-03 ... 2018-... Radio (index) float64 1kB 2.541e+05 ... 1.104e+05 Direct Mail (index) float64 1kB 6.784e+05 ... 6.805e+05 Insert (index) float64 1kB 1.298e+05 ... 1.966e+04 Newspaper (index) float64 1kB 5.076e+05 ... 2.277e+03 Online Display (index) float64 1kB 6.136e+04 ... 8.808e+04 ... ... hldy_Presidents Day (index) int64 1kB 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 hldy_Prime Day (index) int64 1kB 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 hldy_Thanksgiving (index) int64 1kB 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 hldy_Valentine's Day (index) int64 1kB 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 hldy_Veterans Day (index) int64 1kB 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 y (index) float64 1kB 7.205e+07 ... 5.165e+07
Model Diagnostics#
Before looking into the results let’s check some diagnostics.
# Number of diverging samples
mmm.idata["sample_stats"]["diverging"].sum().item()
0
We do not have divergent transitions, which is good!
az.summary(
data=mmm.idata,
var_names=[
"intercept",
"y_sigma",
"saturation_beta",
"saturation_lam",
"adstock_alpha",
"gamma_control",
"gamma_fourier",
],
)["r_hat"].describe()
count 55.0
mean 1.0
std 0.0
min 1.0
25% 1.0
50% 1.0
75% 1.0
max 1.0
Name: r_hat, dtype: float64
On the other hand, the R-hat statistics are all close to 1, which is also good!
Next let’s look into the model trace.
_ = az.plot_trace(
data=mmm.fit_result,
var_names=[
"intercept",
"y_sigma",
"saturation_beta",
"saturation_lam",
"adstock_alpha",
"gamma_control",
"gamma_fourier",
],
compact=True,
backend_kwargs={"figsize": (12, 10), "layout": "constrained"},
)
plt.gcf().suptitle("Model Trace", fontsize=16);

We see a good mixing in the trace, which is consistent with the good R-hat statistics.
Posterior Predictive Checks#
As our model and posterior samples are looking good, we can now look into the posterior predictive checks.
mmm.sample_posterior_predictive(X_train, extend_idata=True, combined=True)
fig = mmm.plot_posterior_predictive(original_scale=True)
fig.gca().set(xlabel="date", ylabel="sales");
Sampling: [y]

The posterior predictive sales look quite reasonable. We are explaining a good amount of the variance in the data and the trends are well captured.
We can look into the errors of the model:
mmm.plot_errors(original_scale=True);

We do not detect any obvious patterns in the errors.
Media Deep Dive#
In this section we perform a deep dive into the media channels insights.
Channel Contributions#
Now that we are happy with the global model fit, we can look into the media and individual channels contributions.
We start by looking into the aggregated contributions.
fig = mmm.plot_components_contributions(original_scale=True)
fig.gca().set(xlabel="date", ylabel="sales");

Next, we look into deeper into the components:
mmm.plot_waterfall_components_decomposition(figsize=(18, 10));

Here are some interesting observations about the model:
The
intercept
accounts for approximately \(60\%\) of the contribution. This is what people usually call the “baseline”.The top media contribution channels are
Online Display
,Direct Mail
andTV
. In some sense we see how the model is averaging the spend share (prior) and the predictive power of the media channels (posterior) .
Saturation Curves#
We continue our analysis by looking into the direct response curves. These will be the main input for the budget optimization step below.
fig = mmm.plot_direct_contribution_curves()
[ax.set(xlabel="spend") for ax in fig.axes];

Tip
These curves by themselves can be a great input for marketing planning! One could use them to set the budget for the next periods and manually optimize the spend for these channels depending on the saturation points. This process can be a good starting point with engage with the team. That being said, we can do better with custom optimization rutines as described in the following sections.
Beside the direct response curves, we can look intro global contribution plots simulations. The idea is to vary the spend by globally multiplying the time series spend by a value \(\delta > 0\) and seeing the expected contribution. PyMC-Marketing also shows the \(94\%\) Highest Density Interval (HDI) of the simulations results.
mmm.plot_channel_contributions_grid(start=0, stop=1.5, num=12, figsize=(15, 7));

We can clearly see the saturation effects for the different channels.
A complementary plot is to plot these same simulations against the actual spend:
mmm.plot_channel_contributions_grid(
start=0, stop=1.5, num=12, absolute_xrange=True, figsize=(15, 7)
);

The vertical lines show the spend from the observed data (i.e. \(\delta = 1 \) above). This plot clearly shows that, for example, Direct Mail
is won’t have much incremental contribution if we keep increasing the spend. On the other hand, channels like Online Display
and Insert
seem to be good candidates to increase the spend. These are the concrete actionable insights you can extract from these typo of models to optimize your media spend mix 🚀!
Return on Ads Spend (ROAS)#
To have a more quantitative metric of efficiency we compute the Return on Ads Spend (ROAS) over the whole training period.
To begin lets plot the contribution share for each channel.
fig = mmm.plot_channel_contribution_share_hdi(figsize=(10, 6))
fig.suptitle("Channel Contribution Share", fontsize=18, fontweight="bold");

These contributions are of course influences by the spend levels. If we want to get a better picture of the efficiency of each channel we need to look into the Return on Ads Spend (ROAS). This is done by computing the ratio of the mean contribution and the spend for each channel.
# Get the channel contributions on the original scale
channel_contribution_original_scale = mmm.compute_channel_contribution_original_scale()
# Sum the contributions over the whole training period
spend_sum = X_train[channel_columns].sum().to_numpy()
spend_sum = spend_sum[
np.newaxis, np.newaxis, :
] # We need these extra dimensions to broadcast the spend sum to the shape of the contributions.
# Compute the ROAS as the ratio of the mean contribution and the spend for each channel
roas_samples = channel_contribution_original_scale.sum(dim="date") / spend_sum
# Plot the ROAS
fig, ax = plt.subplots(figsize=(10, 6))
az.plot_forest(roas_samples, combined=True, ax=ax)
fig.suptitle("Return on Ads Spend (ROAS)", fontsize=18, fontweight="bold");

Here we see that the channels with the highest ROAS on average are Online Display
and Insert
whereas Direct Mail
has the lowest ROAS. This is consistent with the previous findings analyzing the direct and global response curves.
Tip
The large HDI intervals for the ROAS for the top channels come from the fact that the spend share of them during the training period is relatively small. Event though we use the spend share as priors, the overall spend levels are also reflected in the estimate (see for example how small the HDI is for Direct Mail
).
We can see this as an opportunity! Based on this analysis we see that Online Display
and Insert
are the best channels to invest in. Hence, if we optimize our budget accordingly (more on the budget optimization below) we can improve our marketing mix and at the same time create more variance in the training data so that we get refined ROAS estimates for future iterations 😎.
It is also useful to compare the ROAS and the contribution share.
# Get the contribution share samples
share_samples = mmm.get_channel_contributions_share_samples()
fig, ax = plt.subplots(figsize=(12, 9))
for i, channel in enumerate(channel_columns):
# Contribution share mean and hdi
share_mean = share_samples.sel(channel=channel).mean().to_numpy()
share_hdi = az.hdi(share_samples.sel(channel=channel))["x"].to_numpy()
# ROAS mean and hdi
roas_mean = roas_samples.sel(channel=channel).mean().to_numpy()
roas_hdi = az.hdi(roas_samples.sel(channel=channel))["x"].to_numpy()
# Plot the contribution share hdi
ax.vlines(share_mean, roas_hdi[0], roas_hdi[1], color=f"C{i}", alpha=0.8)
# Plot the ROAS hdi
ax.hlines(roas_mean, share_hdi[0], share_hdi[1], color=f"C{i}", alpha=0.8)
# Plot the means
ax.scatter(
share_mean,
roas_mean,
# Size of the scatter points is proportional to the spend share
s=5 * (spend_shares[i] * 100),
color=f"C{i}",
edgecolor="black",
label=channel,
)
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"{x:.0%}"))
ax.legend(
bbox_to_anchor=(1.05, 1), loc="upper left", title="Channel", title_fontsize=12
)
ax.set(
title="Channel Contribution Share vs ROAS",
xlabel="Contribution Share",
ylabel="ROAS",
);

Out-of-Sample Predictions#
Until now we have analyzed the model fit from the in-sample perspective. If we want to follow the model recommendations, we need to have evidence that the model is not only good in explaining the past but also good at making predictions. This is a key component to getting the buy in from the business stakeholders.
The PyMC-Marketing MMM API allows to easily generate out-of-sample predictions from a fitted model as follows:
y_pred_test = mmm.sample_posterior_predictive(
X_test,
include_last_observations=True,
original_scale=True,
var_names=["y", "channel_contributions"],
extend_idata=False,
progressbar=False,
random_seed=rng,
)
We can visualize and compare the in sample and out of sample predictions.
fig, ax = plt.subplots()
mmm.plot_posterior_predictive(original_scale=True, ax=ax)
test_hdi_94 = az.hdi(y_pred_test["y"].to_numpy().T, hdi_prob=0.94)
test_hdi_50 = az.hdi(y_pred_test["y"].to_numpy().T, hdi_prob=0.50)
ax.fill_between(
X_test[date_column],
test_hdi_94[:, 0],
test_hdi_94[:, 1],
color="C1",
label=f"{0.94:.0%} HDI (test)",
alpha=0.2,
)
ax.fill_between(
X_test[date_column],
test_hdi_50[:, 0],
test_hdi_50[:, 1],
color="C1",
label=f"{0.50:.0%} HDI (test)",
alpha=0.4,
)
ax.plot(X_test[date_column], y_test, color="black")
ax.plot(
X_test[date_column],
y_pred_test["y"].mean(dim="sample"),
color="C1",
linewidth=3,
label="Posterior Predictive Mean",
)
ax.axvline(X_test[date_column].iloc[0], color="C2", linestyle="--")
ax.legend(loc="upper center", bbox_to_anchor=(0.5, -0.1), ncol=4)
ax.set(ylabel="sales");

Overall, both the in-sample and out-of-sample predictions look very reasonable.
Let’s zoom in the out-of-sample predictions:
fig, ax = plt.subplots()
ax.fill_between(
X_test[date_column],
test_hdi_94[:, 0],
test_hdi_94[:, 1],
color="C1",
label=f"{0.94:.0%} HDI (test)",
alpha=0.2,
)
ax.fill_between(
X_test[date_column],
test_hdi_50[:, 0],
test_hdi_50[:, 1],
color="C1",
label=f"{0.50:.0%} HDI (test)",
alpha=0.4,
)
ax.plot(X_test[date_column], y_test, marker="o", color="black", label="Observed")
ax.plot(
X_test[date_column],
y_pred_test["y"].mean(dim="sample"),
marker="o",
markersize=8,
color="C1",
linewidth=3,
label="Posterior Predictive Mean",
)
ax.legend(loc="upper center", bbox_to_anchor=(0.5, -0.1), ncol=4)
ax.set(title="Out-of-Sample Predictions", xlabel="date", ylabel="sales");

These look actually very good!
To have a concrete score metric we can use the Continuous Ranked Probability Score (CRPS). This is a generalization of the mean absolute error to the case of probabilistic predictions. For more detail see the Time-Slice-Cross-Validation and Parameter Stability notebook.
crps_train = crps(
y_true=y_train.to_numpy(),
y_pred=az.extract(mmm.idata["posterior_predictive"], var_names=["y"]).T,
)
crps_test = crps(y_true=y_test.to_numpy(), y_pred=y_pred_test["y"].T)
fig, ax = plt.subplots(figsize=(10, 6))
ax.bar(["Train", "Test"], [crps_train, crps_test], color=["C0", "C1"])
ax.set(ylabel="CRPS", title="CRPS Score");

Interestingly enough, the model performance is better in the test set. There could me many reasons for it:
The model is not capturing the end of the seasonality very well in the training set.
There is less variability or complexity in the test set data.
The test set has fewer points, which could affect the CRPS calculation comparison.
In order to have a better comparison we should look into the time slice cross validation results as in the Time-Slice-Cross-Validation and Parameter Stability notebook.
Next, we look into the out-of-sample channel contributions.
fig, axes = plt.subplots(
nrows=len(channel_columns),
ncols=1,
figsize=(15, 3 * n_channels),
sharex=True,
sharey=True,
layout="constrained",
)
for i, channel in enumerate(channel_columns):
for j, hdi_prob in enumerate([0.94, 0.5]):
ax = axes[i]
az.plot_hdi(
X_test[date_column],
y_pred_test["channel_contributions"]
.sel(channel=channel)
.unstack()
.transpose(..., "date"),
hdi_prob=hdi_prob,
color=f"C{i}",
smooth=False,
fill_kwargs={"alpha": 0.2 + 0.2 * j, "label": f"{hdi_prob:.0%} HDI (test)"},
ax=ax,
)
ax.legend(loc="upper right")
ax.set(title=channel, ylabel="sales")
fig.suptitle("Channel Contributions - Out-of-Sample Predictions", fontsize=16);

We do not have the true values contributions to compare against because this is what we are trying to estimate on the first place. Nevertheless, we can still do the time slice cross validation at the channel level to analyze the stability of the out-of-sample predictions and media parameters (adstock and saturation).
Budget Optimization#
In this las section we explain how to use PyMC-Marketing to perform optimization of the media budget allocation. The main idea is to use the response curves to optimize the spend. For more details see Budget Allocation with PyMC-Marketing notebook.
For this example, let us assume we want to re-allocate the budget of the tests set. The PyMC-Marketing MMM optimizer would assume we will equally allocate the budget to all channels over the next num_periods
periods. Hence, when we specify the budget
variable we are actually specifying the weekly budget (in this case). Therefore, it is important to understand the past average weekly spend per channel:
X_test[channel_columns].sum(axis=0)
Direct Mail 12107275.13
Insert 1065383.05
Newspaper 273433.16
Online Display 6498301.30
Radio 3384098.82
Social Media 4298913.23
TV 2610543.53
dtype: float64
num_periods = X_test[date_column].shape[0]
# Total budget for the test set per channel
all_budget_per_channel = X_test[channel_columns].sum(axis=0)
# Total budget for the test set
all_budget = all_budget_per_channel.sum()
# Weekly budget for each channel
# We are assuming we do not know the total budget allocation in advance
# so this would be the naive "uniform" allocation.
per_channel_weekly_budget = all_budget / (num_periods * len(channel_columns))
Let’s plot the average weekly spend per channel for the actual test set.
fig, ax = plt.subplots()
(
all_budget_per_channel.div(num_periods * len(channel_columns))
.sort_index(ascending=False)
.plot.barh(color="C0", ax=ax)
)
ax.set(title=f"Weekly Average Total Spend per Channel (Next {num_periods} periods)");

Observe that the spend distribution on the test set is considerably different from the train set. When planning future spends, this is usually the case as the team might be exploring new investment opportunities. For example that in the test set we have on average more spend on Online Display
while less spend on Insert
, both with high ROAS from the training data.
average_spend_per_channel_train = X_train[channel_columns].sum(axis=0) / (
X_train.shape[0] * len(channel_columns)
)
average_spend_per_channel_test = X_test[channel_columns].sum(axis=0) / (
X_test.shape[0] * len(channel_columns)
)
fig, ax = plt.subplots(figsize=(10, 6))
average_spend_per_channel_train.to_frame(name="train").join(
other=average_spend_per_channel_test.to_frame(name="test"),
how="inner",
).sort_index(ascending=False).plot.barh(color=["black", "C0"], ax=ax)
ax.set(title="Weekly Average Total Spend per Channel", xlabel="spend");

Next, let’s talk about the variability of the allocation. Even if the model finds Insert
as the most efficient channel, strategically is very hard to simply put \(100\%\) of the budget into this channel. A better strategy, which generally plays well with the stakeholders, is to allow a custom degree of flexibility and some upper and lower bounds for each channel budget.
For example, we will allow a \(50\%\) difference on the past average weekly spend for each channel.
percentage_change = 0.5
mean_spend_per_period_test = (
X_test[channel_columns].sum(axis=0).div(num_periods * len(channel_columns))
)
budget_bounds = {
key: [(1 - percentage_change) * value, (1 + percentage_change) * value]
for key, value in (mean_spend_per_period_test).to_dict().items()
}
Now we are ready to pass all this business requirements to the optimizer.
model_granularity = "weekly"
allocation_strategy, optimization_result = mmm.optimize_budget(
budget=per_channel_weekly_budget,
num_periods=num_periods,
budget_bounds=budget_bounds,
minimize_kwargs={
"method": "SLSQP",
"options": {"ftol": 1e-9, "maxiter": 5_000},
},
)
response = mmm.sample_response_distribution(
allocation_strategy=allocation_strategy,
time_granularity=model_granularity,
num_periods=num_periods,
noise_level=0.05,
)
/Users/juanitorduz/Documents/pymc-marketing/pymc_marketing/mmm/budget_optimizer.py:214: UserWarning: Using default equality constraint
self.set_constraints(
Let’s visualize the results.
fig, ax = plt.subplots()
(
pd.Series(allocation_strategy, index=mean_spend_per_period_test.index)
.to_frame(name="optimized_allocation")
.join(mean_spend_per_period_test.to_frame(name="initial_allocation"))
.sort_index(ascending=False)
.plot.barh(figsize=(12, 8), color=["C1", "C0"], ax=ax)
)
ax.set(title="Budget Allocation Comparison", xlabel="allocation");

We verify that the sum of the budget allocation is the same as the initial budget.
np.testing.assert_allclose(
mean_spend_per_period_test.sum(), allocation_strategy.sum().item()
)
The first thing we notice is how the spend on Direct Mail
has been reduced. This is consistent with the previous findings that this channel is not very efficient. The spends has been distributed more evenly across the other channels. Interestingly, we see how the spend on Online Display
is slightly reduced. This might be explained by the fact that this channel already had an average spend increase from the training to the test set.
Next, we can look into the expected channel contributions with this new budget allocation over time.
fig = mmm.plot_allocated_contribution_by_channel(samples=response)
fig.suptitle("Estimated Contribution per channel over time", fontsize=16);

Note
Observe that uncertainty increases given the new level of spending. For example, historically spending for Online Display
was in between \(200\)K to \(250\)K and we are now recommending around \(350\)K. The fact that this spend level has not been reached before is refelected in the increased uncertainty (yellow band).
We can also obtain the average expected contributions per channel with the new budget allocation.
fig, ax = mmm.plot_budget_allocation(samples=response, figsize=(12, 8))
fig.suptitle("Response vs Spend per channel", fontsize=16);

Lastly, we want to compare the optimized budget allocation with the initial (naive) allocation. We compare this at two levels:
Channel contributions: We compute the difference between the optimized and naive channel contributions from the posterior predictive samples.
Total Sales: We compute the difference between the optimized and naive total sales from the posterior predictive samples. In this case we include the intercept and seasonality, but we se the control variables to zero as in many cases we do not have them for the future.
Channel Contribution Comparison#
The first thing we need to do is to compute the naive channel contributions for the test period. We do this by simply setting these values as uniform across the test period and then using the model (trained on the training period) to compute the expected channel contributions.
X_actual_uniform = X_test.copy()
for channel in channel_columns:
X_actual_uniform[channel] = mean_spend_per_period_test[channel]
for control in control_columns:
X_actual_uniform[control] = 0.0
pred_test_uniform = mmm.sample_posterior_predictive(
X_actual_uniform,
include_last_observations=True,
original_scale=True,
var_names=["y", "channel_contributions"],
extend_idata=False,
progressbar=False,
random_seed=rng,
)
Let’s visualize the optimized and naive channel contributions.
response_channel_contribution_original_scale = apply_sklearn_transformer_across_dim(
data=response["channel_contributions"]
.unstack()
.sum(dim="channel")
.transpose(..., "date"),
func=mmm.get_target_transformer().inverse_transform,
dim_name="date",
)
fig, ax = plt.subplots(figsize=(15, 10))
az.plot_hdi(
X_test[date_column],
response_channel_contribution_original_scale,
hdi_prob=0.94,
color="C1",
fill_kwargs={"alpha": 0.2, "label": f"Optimized {0.94:.0%} HDI (test)"},
smooth=False,
ax=ax,
)
az.plot_hdi(
X_test[date_column],
response_channel_contribution_original_scale,
hdi_prob=0.5,
color="C1",
fill_kwargs={"alpha": 0.4, "label": f"Optimized {0.50:.0%} HDI (test)"},
smooth=False,
ax=ax,
)
ax.plot(
X_test[date_column],
response_channel_contribution_original_scale.mean(dim=("chain", "draw")),
marker="o",
markersize=8,
color="C1",
linewidth=3,
label="Optimized Posterior Predictive Mean",
)
az.plot_hdi(
X_actual_uniform[date_column],
pred_test_uniform["channel_contributions"]
.unstack()
.sum(dim="channel")
.transpose(..., "date"),
hdi_prob=0.94,
color="C2",
fill_kwargs={"alpha": 0.2, "label": f"Naive {0.94:.0%} HDI (test)"},
smooth=False,
ax=ax,
)
az.plot_hdi(
X_actual_uniform[date_column],
pred_test_uniform["channel_contributions"]
.unstack()
.sum(dim="channel")
.transpose(..., "date"),
hdi_prob=0.5,
color="C2",
fill_kwargs={"alpha": 0.4, "label": f"Naive {0.50:.0%} HDI (test)"},
smooth=False,
ax=ax,
)
ax.plot(
X_actual_uniform[date_column],
pred_test_uniform["channel_contributions"]
.unstack()
.sum(dim="channel")
.transpose(..., "date")
.mean(dim=("chain", "draw")),
marker="o",
markersize=8,
color="C2",
linewidth=3,
label="Naive Posterior Predictive Mean",
)
ax.legend(loc="lower center", bbox_to_anchor=(0.5, -0.20), ncol=3)
ax.set(
title="Out-of-Sample Predictions - Optimized Budget Allocation",
xlabel="date",
ylabel="sales",
);

We see indeed that the optimized allocation strategy is more efficient regarding the aggregated channel contributions. This is exactly what we were looking for!
We can quantify the percentage improvement of the optimized allocation by comparing the two posterior distributions aggregated over time.
fig, ax = plt.subplots()
az.plot_posterior(
(
response_channel_contribution_original_scale.sum(dim="date")
- pred_test_uniform["channel_contributions"]
.unstack()
.sum(dim="channel")
.transpose(..., "date")
.sum(dim="date")
)
/ response_channel_contribution_original_scale.sum(dim="date"),
ref_val=0,
ax=ax,
)
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"{x:.0%}"))
ax.set(
title="Channel Contributions\nDifference between optimized and initial allocation"
);

Concretely, we see an approximately \(10\%\) increase in the aggregated channel contributions. This could result in a significant increase in monetary value!
Note
The orange line and corresponding range compares these posterior samples to the reference value of zero. Hence, we have a probability of more than \(80\%\) that the optimized allocation will perform better than the initial allocation.
Sales Comparison#
Next, we want to compare the optimized and naive total sales. The strategy is the same as in the case above. The only difference is that we add additional variables like seasonality and baseline (intercept). As mentioned above, we set the control variables to zero for the sake of comparison. This is what is done in the allocate_budget_to_maximize_response()
method.
y_response_original_scale = apply_sklearn_transformer_across_dim(
data=response["y"].unstack().transpose(..., "date"),
func=mmm.get_target_transformer().inverse_transform,
dim_name="date",
)
fig, ax = plt.subplots(figsize=(15, 10))
az.plot_hdi(
X_test[date_column],
y_response_original_scale,
hdi_prob=0.94,
color="C1",
fill_kwargs={"alpha": 0.2, "label": f"Optimized {0.94:.0%} HDI (test)"},
smooth=False,
ax=ax,
)
az.plot_hdi(
X_test[date_column],
y_response_original_scale,
hdi_prob=0.5,
color="C1",
fill_kwargs={"alpha": 0.4, "label": f"Optimized {0.50:.0%} HDI (test)"},
smooth=False,
ax=ax,
)
ax.plot(
X_test[date_column],
y_response_original_scale.mean(dim=("chain", "draw")),
marker="o",
markersize=8,
color="C1",
linewidth=3,
label="Optimized Posterior Predictive Mean",
)
az.plot_hdi(
X_actual_uniform[date_column],
pred_test_uniform["y"].unstack().transpose(..., "date"),
hdi_prob=0.94,
color="C2",
fill_kwargs={"alpha": 0.2, "label": f"Naive {0.94:.0%} HDI (test)"},
smooth=False,
ax=ax,
)
az.plot_hdi(
X_actual_uniform[date_column],
pred_test_uniform["y"].unstack().transpose(..., "date"),
hdi_prob=0.5,
color="C2",
fill_kwargs={"alpha": 0.4, "label": f"Naive {0.50:.0%} HDI (test)"},
smooth=False,
ax=ax,
)
ax.plot(
X_actual_uniform[date_column],
pred_test_uniform["y"].unstack().transpose(..., "date").mean(dim=("chain", "draw")),
marker="o",
markersize=8,
color="C2",
linewidth=3,
label="Naive Posterior Predictive Mean",
)
ax.legend(loc="lower center", bbox_to_anchor=(0.5, -0.20), ncol=3)
ax.set(
title="Out-of-Sample Predictions - Optimized Budget Allocation",
xlabel="date",
ylabel="sales",
);

At the level of total sales, in this case the difference is negligible as the baseline and seasonality (which are the same in both cases) explain most of the variance as the average weekly media spend in this last period is much smaller as in the training set.
fig, ax = plt.subplots()
az.plot_posterior(
(
y_response_original_scale.sum(dim="date")
- pred_test_uniform["y"].unstack().sum(dim="date")
)
/ y_response_original_scale.sum(dim="date"),
ref_val=0,
ax=ax,
)
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"{x:.0%}"))
ax.set(title="Sales\nDifference between optimized and naive sales", xlim=(-1, 1));

In this case we see that when we compare the posterior samples of the uplift to the reference value of zero we do not find any difference. This should not be discouraging as we did see a \(10\%\) increase in the aggregated (over time) channel contributions. This can translate to a lot of money! There is no discrepancy on the results. What we are seeing is that the constrained optimization problem where we are just allowed, because of business requirements (which is very common for many marketing teams), to change the budget up to \(60\%\) we improved the channel contributions significantly but not at the same order of magnitude of the other model components.
Conclusion#
This case study demonstrated how to leverage PyMC-Marketing’s Media Mix Modeling capabilities to gain actionable insights into marketing effectiveness and optimize budget allocation. We walked through the entire process from data preparation and exploratory analysis to model fitting, diagnostics, and budget optimization. The model provided valuable insights into the effectiveness of different marketing channels, their saturation effects, and return on ad spend. We were able to use these insights to propose an optimized budget allocation that aims to maximize sales while respecting business constraints.
Next Steps#
Stakeholder engagement: Present findings to key stakeholders, focusing on actionable insights and potential business impact. Understand their feedback and iterate.
Scenario planning: Use the model to simulate various budget scenarios and market conditions to support strategic decision-making.
Integration with other tools: Explore ways to integrate the MMM insights with other marketing analytics tools and processes for a more holistic view of marketing performance.
Time-slice cross-validation: Implement a more robust validation strategy to ensure model stability and performance over time (see Time-Slice-Cross-Validation and Parameter Stability).
Iterate and refine: This analysis represents a first iteration. Future iterations could incorporate:
Lift test data to better calibrate the model (see Lift Test Calibration)
More granular data (e.g., daily instead of weekly)
Additional variables like competitor actions or macroeconomic factors
A/B (or Geo) testing: Design experiments to test the model’s recommendations and validate its predictions in the real world.
Regular updates: Establish a process for regularly updating the model with new data to keep insights current.
Explore advanced features: Investigate other PyMC-Marketing capabilities, such as incorporating time varying parameters (see and MMM with time-varying parameters (TVP) (mmm_time_varying_media_example) and custom models like hierarchical levels or more complex seasonality patterns, to further refine the model (see Custom Models with MMM components).
%load_ext watermark
%watermark -n -u -v -iv -w -p pymc_marketing,pytensor,numpyro
Last updated: Sun Jan 26 2025
Python implementation: CPython
Python version : 3.12.8
IPython version : 8.31.0
pymc_marketing: 0.10.0
pytensor : 2.26.4
numpyro : 0.16.1
pandas : 2.2.3
matplotlib : 3.10.0
pymc_marketing: 0.10.0
arviz : 0.20.0
seaborn : 0.13.2
numpy : 1.26.4
numpyro : 0.16.1
Watermark: 2.5.0