MMM End-to-End Case Study#

In today’s competitive business landscape, optimizing marketing spend across channels is crucial for maximizing return on investment and driving business growth. This notebook demonstrates how to leverage PyMC-Marketing’s Media Mix Modeling (MMM) to gain actionable insights into marketing effectiveness and optimize budget allocation.

We’ll walk through a real-world case study using data from a PyData Global 2022 presentation (Hajime Takeda - Media Mix Modeling:How to Measure the Effectiveness of Advertising). Our goal is to show how MMM can help marketing teams:

  1. Quantify the impact of different marketing channels on sales

  2. Understand saturation effects and diminishing returns

  3. Identify opportunities to reallocate budget for improved performance

  4. Make data-driven decisions about future marketing investments

Outline

  1. Data Preparation and Exploratory Analysis

  2. Model Specification and Fitting

  3. Model Diagnostics and Validation

  4. Marketing Channel Deep Dive

    • Channel Contributions

    • Saturation Curves

    • Return on Ad Spend (ROAS)

  5. Out-of-Sample Predictions

  6. Budget Optimization

  7. Key Takeaways and Next Steps

Note

In this notebook we guide you through what a typical MMM first iteration looks like (note that what a first iteration means depends on the business context). We focus on the application of the model for a concrete business case. If you want to know more about the model specification please refer to the MMM Example Notebook notebook.

Prepare Notebook#

We load the necessary libraries and set the notebook’s configuration.

import warnings

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

from pymc_marketing.metrics import crps
from pymc_marketing.mmm import MMM, GeometricAdstock, LogisticSaturation
from pymc_marketing.mmm.utils import apply_sklearn_transformer_across_dim
from pymc_marketing.prior import Prior

warnings.filterwarnings("ignore", category=FutureWarning)

az.style.use("arviz-darkgrid")
plt.rcParams["figure.figsize"] = [12, 7]
plt.rcParams["figure.dpi"] = 100

%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = "retina"
seed: int = sum(map(ord, "mmm"))
rng: np.random.Generator = np.random.default_rng(seed=seed)

Load Data#

We load the data from a CSV file as a pandas DataFrame. We also do some light data cleaning following the filtering of the original case study. In essence, we are filtering and renaming certain marketing channels.

raw_df = pd.read_csv("https://raw.githubusercontent.com/sibylhe/mmm_stan/main/data.csv")

# 1. control variables
# We just keep the holidays columns
control_columns = [col for col in raw_df.columns if "hldy_" in col]


# 2. media variables
channel_columns_raw = sorted(
    [
        col
        for col in raw_df.columns
        if "mdsp_" in col
        and col != "mdsp_viddig"
        and col != "mdsp_auddig"
        and col != "mdsp_sem"
    ]
)


channel_mapping = {
    "mdsp_dm": "Direct Mail",
    "mdsp_inst": "Insert",
    "mdsp_nsp": "Newspaper",
    "mdsp_audtr": "Radio",
    "mdsp_vidtr": "TV",
    "mdsp_so": "Social Media",
    "mdsp_on": "Online Display",
}

channel_columns = sorted(list(channel_mapping.values()))

# 3. sales variables
sales_col = "sales"

data_df = raw_df[["wk_strt_dt", sales_col, *channel_columns_raw, *control_columns]]
data_df = data_df.rename(columns=channel_mapping)


# 4. Date column
data_df["wk_strt_dt"] = pd.to_datetime(data_df["wk_strt_dt"])
date_column = "wk_strt_dt"

Exploratory Data Analysis#

We should always start looking at the data! We can start by plotting the target variable, i.e. the sales.

fig, ax = plt.subplots()
sns.lineplot(data=data_df, x=date_column, y="sales", color="black", ax=ax)
ax.set(title="Sales", xlabel="date", ylabel="sales");
../../_images/ef1123707639e04d904467a7d3dcd976b9b106c5f29224d91f464e6a001c0e0f.png

This time series, which comes in weekly granularity, has a clear yearly seasonality pattern and apparently a mild downward trend.

Next, let’s look into the marketing channels spend data. We first look into the share to get a feeling on the relative spend across channels.

fig, ax = plt.subplots()
data_df.melt(
    value_vars=channel_columns, var_name="channel", value_name="spend"
).groupby("channel").agg({"spend": "sum"}).sort_values(by="spend").plot.barh(ax=ax)
ax.set(title="Total Media Spend", xlabel="Spend", ylabel="Channel");
../../_images/2c8f0959164882193a2d1283fecf5c053397ce9575f365d282ac52ce74b1c270.png

In this specific example we see that Direct Mail is the channel with the highest spend, followed by Newspaper and Online Display.

Next, we look into the time series of each of the marketing channels.

fig, ax = plt.subplots()
data_df.set_index("wk_strt_dt")[channel_columns].plot(ax=ax)
ax.legend(title="Channel", fontsize=12)
ax.set(title="Media Spend", xlabel="Date", ylabel="Spend");
../../_images/55040111ad7a510808630470cbef1cc85d0081daab113a63a944c2692dfff365.png

We see an overall decerease spend over time. This might explain the downward trend in the sales time series.

Next, we can compute the correlation between the marketing channels and the sales.

Note

We all know that correlation does not imply causation. However, it is a good idea to look into the correlation between the marketing channels and the target variable. This can hel us spot issues in the data (very commom in real cases) and inspire some feature engineering.

n_channels = len(channel_columns)

fig, axes = plt.subplots(
    nrows=n_channels,
    ncols=1,
    figsize=(15, 3 * n_channels),
    sharex=True,
    sharey=False,
    layout="constrained",
)

for i, channel in enumerate(channel_columns):
    ax = axes[i]
    ax_twin = ax.twinx()
    sns.lineplot(data=data_df, x=date_column, y=channel, color=f"C{i}", ax=ax)
    sns.lineplot(data=data_df, x=date_column, y=sales_col, color="black", ax=ax_twin)
    correlation = data_df[[channel, sales_col]].corr().iloc[0, 1]
    ax_twin.grid(None)
    ax.set(title=f"{channel} (Correlation: {correlation:.2f})")

ax.set_xlabel("date");
../../_images/02db849ad0c11d0fcdbe6e92183be44d6cbfbfcc0d9f4df58d179ee1943ccdae.png

The channels with the highest correlation with the sales are TV, Insert, and Online Display. Observe that the biggest channel in spend, Direct Mail, has the lowest correlation.

Train Test Split#

We are now ready to fit the model. We start by splitting the data into training and testing.

Note

In a real case scenario you should use more data for training than for testing (see Time-Slice-Cross-Validation and Parameter Stability). In this case we are using a small dataset for demonstrative purposes.

train_test_split_date = pd.to_datetime("2018-02-01")

train_mask = data_df.wk_strt_dt <= train_test_split_date
test_mask = data_df.wk_strt_dt > train_test_split_date

train_df = data_df[train_mask]
test_df = data_df[test_mask]

fig, ax = plt.subplots()
sns.lineplot(data=train_df, x="wk_strt_dt", y="sales", color="C0", label="Train", ax=ax)
sns.lineplot(data=test_df, x="wk_strt_dt", y="sales", color="C1", label="Test", ax=ax)
ax.set(title="Sales - Train Test Split", xlabel="date");
../../_images/f8a20012f20b883cdfd6e8155953236f33b8b18fcc94ce3ae0de297a0eba9c4a.png
X_train = train_df.drop(columns=sales_col)
X_test = test_df.drop(columns=sales_col)

y_train = train_df[sales_col]
y_test = test_df[sales_col]

Model Specification#

Assuming we are working on the first iteration, we typically would not have lift tests data to calibrate the model. This is fine, we can start with a spends priors to get started.

Tip

Ideally, in the next iteration we should have some of these lift tests to calibrate the model. For more details see Lift Test Calibration and the Case Study: Unobserved Confounders, ROAS and Lift Tests case study notebook.

The idea behind the spends priors is that we use the spend shares as a proxy for the effect of the media channels on the sales. This is not perfect but can be a good starting point. In essence, we are giving the higher spend channels more flexibility to fit the data with a wider prior. Let’s see how this looks like. First, we compute the spend shares (on the training data!).

spend_shares = (
    X_train.melt(value_vars=channel_columns, var_name="channel", value_name="spend")
    .groupby("channel", as_index=False)
    .agg({"spend": "sum"})
    .sort_values(by="channel")
    .assign(spend_share=lambda x: x["spend"] / x["spend"].sum())["spend_share"]
    .to_numpy()
)

prior_sigma = spend_shares

Next, we use these spend shares to set the prior of the model using the model configuration.

model_config = {
    "intercept": Prior("Normal", mu=0.2, sigma=0.05),
    "saturation_beta": Prior("HalfNormal", sigma=prior_sigma, dims="channel"),
    "saturation_lam": Prior("Gamma", alpha=3, beta=1, dims="channel"),
    "gamma_control": Prior("Normal", mu=0, sigma=1),
    "gamma_fourier": Prior("Laplace", mu=0, b=1),
    "likelihood": Prior("TruncatedNormal", lower=0, sigma=Prior("HalfNormal", sigma=1)),
}

We are now ready to specify the model (see MMM Example Notebook for more details). As the yearly seasonal component is not that strong, we use few Fourier terms.

sampler_config = {"progressbar": True}

mmm = MMM(
    model_config=model_config,
    sampler_config=sampler_config,
    date_column=date_column,
    adstock=GeometricAdstock(l_max=6),
    saturation=LogisticSaturation(),
    channel_columns=channel_columns,
    control_columns=control_columns,
    yearly_seasonality=5,
)

Prior Predictive Checks#

We can now generate prior predictive samples to see how the model behaves under the prior specification.

# Generate prior predictive samples
mmm.sample_prior_predictive(X_train, y_train, samples=4_000)
fig, ax = plt.subplots()
fig = mmm.plot_prior_predictive(original_scale=True, ax=ax)
ax.legend(loc="lower center", bbox_to_anchor=(0.5, -0.2), ncol=4)
ax.set(xlabel="date", ylabel="sales");
Sampling: [adstock_alpha, gamma_control, gamma_fourier, intercept, saturation_beta, saturation_lam, y, y_sigma]
../../_images/7ed2a1ed3534cc2833988d9aeb4a59cf0dbd4fc405f08f5b4de567dbebbfd845.png

Overall, the priors are not very informative and the ranges is approximately one order of magnitude of the observed data. Note that the choice of likelihood as a truncated normal is to avoid negative values for the target variable.

Let’s look into the channel contribution share before fitting the model.

fig = mmm.plot_channel_contribution_share_hdi(prior=True, figsize=(10, 6))
fig.suptitle("Channel Contribution Share", fontsize=18, fontweight="bold");
../../_images/fcf20ab67a88956335f3ef25cd469aa3ada25ed77f56be808f32b1bc90c7b679.png

We confirm these represent the spend shares as expected from the prior specification.

Model Fitting#

We now fit the model to the training data.

try:
    import numpyro  # noqa: F401

    nuts_sampler = "numpyro"
except ModuleNotFoundError:
    nuts_sampler = "pymc"

mmm.fit(
    X=X_train,
    y=y_train,
    target_accept=0.9,
    chains=4,
    draws=4_000,
    nuts_sampler=nuts_sampler,
    random_seed=rng,
)
arviz.InferenceData
    • <xarray.Dataset> Size: 968MB
      Dimensions:                          (chain: 4, draw: 4000, channel: 7,
                                            date: 183, control: 22, fourier_mode: 10)
      Coordinates:
        * chain                            (chain) int64 32B 0 1 2 3
        * draw                             (draw) int64 32kB 0 1 2 ... 3997 3998 3999
        * channel                          (channel) <U14 392B 'Direct Mail' ... 'TV'
        * date                             (date) datetime64[ns] 1kB 2014-08-03 ......
        * control                          (control) <U24 2kB 'hldy_Black Friday' ....
        * fourier_mode                     (fourier_mode) <U5 200B 'sin_1' ... 'cos_5'
      Data variables: (12/13)
          adstock_alpha                    (chain, draw, channel) float64 896kB 0.2...
          channel_contributions            (chain, draw, date, channel) float64 164MB ...
          control_contributions            (chain, draw, date, control) float64 515MB ...
          fourier_contributions            (chain, draw, date, fourier_mode) float64 234MB ...
          gamma_control                    (chain, draw, control) float64 3MB 1.315...
          gamma_fourier                    (chain, draw, fourier_mode) float64 1MB ...
          ...                               ...
          mu                               (chain, draw, date) float64 23MB 0.1949 ...
          saturation_beta                  (chain, draw, channel) float64 896kB 0.0...
          saturation_lam                   (chain, draw, channel) float64 896kB 2.3...
          total_contributions              (chain, draw) float64 128kB 31.2 ... 29.56
          y_sigma                          (chain, draw) float64 128kB 0.1111 ... 0...
          yearly_seasonality_contribution  (chain, draw, date) float64 23MB -0.0398...
      Attributes:
          created_at:                 2025-01-26T11:12:47.201838+00:00
          arviz_version:              0.20.0
          inference_library:          numpyro
          inference_library_version:  0.16.1
          sampling_time:              25.496079
          tuning_steps:               1000

    • <xarray.Dataset> Size: 816kB
      Dimensions:          (chain: 4, draw: 4000)
      Coordinates:
        * chain            (chain) int64 32B 0 1 2 3
        * draw             (draw) int64 32kB 0 1 2 3 4 5 ... 3995 3996 3997 3998 3999
      Data variables:
          acceptance_rate  (chain, draw) float64 128kB 0.9677 0.9754 ... 0.8467 0.9646
          diverging        (chain, draw) bool 16kB False False False ... False False
          energy           (chain, draw) float64 128kB -75.03 -84.74 ... -65.96 -78.31
          lp               (chain, draw) float64 128kB -108.6 -105.7 ... -93.79 -106.8
          n_steps          (chain, draw) int64 128kB 127 127 127 127 ... 127 127 63
          step_size        (chain, draw) float64 128kB 0.03399 0.03399 ... 0.03713
          tree_depth       (chain, draw) int64 128kB 7 7 7 7 7 7 7 7 ... 7 7 7 7 7 7 6
      Attributes:
          created_at:     2025-01-26T11:12:47.217039+00:00
          arviz_version:  0.20.0

    • <xarray.Dataset> Size: 242MB
      Dimensions:                          (chain: 1, draw: 4000, channel: 7,
                                            date: 183, control: 22, fourier_mode: 10)
      Coordinates:
        * chain                            (chain) int64 8B 0
        * draw                             (draw) int64 32kB 0 1 2 ... 3997 3998 3999
        * channel                          (channel) <U14 392B 'Direct Mail' ... 'TV'
        * date                             (date) datetime64[ns] 1kB 2014-08-03 ......
        * control                          (control) <U24 2kB 'hldy_Black Friday' ....
        * fourier_mode                     (fourier_mode) <U5 200B 'sin_1' ... 'cos_5'
      Data variables: (12/13)
          adstock_alpha                    (chain, draw, channel) float64 224kB 0.0...
          channel_contributions            (chain, draw, date, channel) float64 41MB ...
          control_contributions            (chain, draw, date, control) float64 129MB ...
          fourier_contributions            (chain, draw, date, fourier_mode) float64 59MB ...
          gamma_control                    (chain, draw, control) float64 704kB 1.0...
          gamma_fourier                    (chain, draw, fourier_mode) float64 320kB ...
          ...                               ...
          mu                               (chain, draw, date) float64 6MB 2.534 .....
          saturation_beta                  (chain, draw, channel) float64 224kB 0.8...
          saturation_lam                   (chain, draw, channel) float64 224kB 1.0...
          total_contributions              (chain, draw) float64 32kB 49.11 ... 49.51
          y_sigma                          (chain, draw) float64 32kB 1.14 ... 1.5
          yearly_seasonality_contribution  (chain, draw, date) float64 6MB 2.187 .....
      Attributes:
          created_at:                 2025-01-26T11:12:18.650352+00:00
          arviz_version:              0.20.0
          inference_library:          pymc
          inference_library_version:  5.20.0

    • <xarray.Dataset> Size: 6MB
      Dimensions:  (chain: 1, draw: 4000, date: 183)
      Coordinates:
        * chain    (chain) int64 8B 0
        * draw     (draw) int64 32kB 0 1 2 3 4 5 6 ... 3994 3995 3996 3997 3998 3999
        * date     (date) datetime64[ns] 1kB 2014-08-03 2014-08-10 ... 2018-01-28
      Data variables:
          y        (chain, draw, date) float64 6MB 0.772 3.712 0.9877 ... 0.04658 1.95
      Attributes:
          created_at:                 2025-01-26T11:12:18.654886+00:00
          arviz_version:              0.20.0
          inference_library:          pymc
          inference_library_version:  5.20.0

    • <xarray.Dataset> Size: 3kB
      Dimensions:  (date: 183)
      Coordinates:
        * date     (date) datetime64[ns] 1kB 2014-08-03 2014-08-10 ... 2018-01-28
      Data variables:
          y        (date) float64 1kB 0.2049 0.2241 0.1993 ... 0.142 0.1289 0.1469
      Attributes:
          created_at:                 2025-01-26T11:12:47.218112+00:00
          arviz_version:              0.20.0
          inference_library:          numpyro
          inference_library_version:  0.16.1
          sampling_time:              25.496079
          tuning_steps:               1000

    • <xarray.Dataset> Size: 47kB
      Dimensions:       (date: 183, channel: 7, control: 22)
      Coordinates:
        * date          (date) datetime64[ns] 1kB 2014-08-03 2014-08-10 ... 2018-01-28
        * channel       (channel) <U14 392B 'Direct Mail' 'Insert' ... 'TV'
        * control       (control) <U24 2kB 'hldy_Black Friday' ... 'hldy_Veterans Day'
      Data variables:
          channel_data  (date, channel) float64 10kB 0.2815 0.2199 ... 0.1188 0.03964
          control_data  (date, control) float64 32kB 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
          dayofyear     (date) int32 732B 215 222 229 236 243 250 ... 365 7 14 21 28
      Attributes:
          created_at:                 2025-01-26T11:12:47.220089+00:00
          arviz_version:              0.20.0
          inference_library:          numpyro
          inference_library_version:  0.16.1
          sampling_time:              25.496079
          tuning_steps:               1000

    • <xarray.Dataset> Size: 47kB
      Dimensions:                   (index: 183)
      Coordinates:
        * index                     (index) int64 1kB 0 1 2 3 4 ... 179 180 181 182
      Data variables: (12/31)
          wk_strt_dt                (index) datetime64[ns] 1kB 2014-08-03 ... 2018-...
          Radio                     (index) float64 1kB 2.541e+05 ... 1.104e+05
          Direct Mail               (index) float64 1kB 6.784e+05 ... 6.805e+05
          Insert                    (index) float64 1kB 1.298e+05 ... 1.966e+04
          Newspaper                 (index) float64 1kB 5.076e+05 ... 2.277e+03
          Online Display            (index) float64 1kB 6.136e+04 ... 8.808e+04
          ...                        ...
          hldy_Presidents Day       (index) int64 1kB 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
          hldy_Prime Day            (index) int64 1kB 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
          hldy_Thanksgiving         (index) int64 1kB 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
          hldy_Valentine's Day      (index) int64 1kB 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
          hldy_Veterans Day         (index) int64 1kB 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
          y                         (index) float64 1kB 7.205e+07 ... 5.165e+07

Model Diagnostics#

Before looking into the results let’s check some diagnostics.

# Number of diverging samples
mmm.idata["sample_stats"]["diverging"].sum().item()
0

We do not have divergent transitions, which is good!

az.summary(
    data=mmm.idata,
    var_names=[
        "intercept",
        "y_sigma",
        "saturation_beta",
        "saturation_lam",
        "adstock_alpha",
        "gamma_control",
        "gamma_fourier",
    ],
)["r_hat"].describe()
count    55.0
mean      1.0
std       0.0
min       1.0
25%       1.0
50%       1.0
75%       1.0
max       1.0
Name: r_hat, dtype: float64

On the other hand, the R-hat statistics are all close to 1, which is also good!

Next let’s look into the model trace.

_ = az.plot_trace(
    data=mmm.fit_result,
    var_names=[
        "intercept",
        "y_sigma",
        "saturation_beta",
        "saturation_lam",
        "adstock_alpha",
        "gamma_control",
        "gamma_fourier",
    ],
    compact=True,
    backend_kwargs={"figsize": (12, 10), "layout": "constrained"},
)
plt.gcf().suptitle("Model Trace", fontsize=16);
../../_images/8127ad90f5b775c0469b0eddd8811050420d4293ca30918a2893232b102ab2cc.png

We see a good mixing in the trace, which is consistent with the good R-hat statistics.

Posterior Predictive Checks#

As our model and posterior samples are looking good, we can now look into the posterior predictive checks.

mmm.sample_posterior_predictive(X_train, extend_idata=True, combined=True)
fig = mmm.plot_posterior_predictive(original_scale=True)
fig.gca().set(xlabel="date", ylabel="sales");
Sampling: [y]

../../_images/18bf0ad8cb8d62e8d04e289fb48a4328bc709416c8d6c5059f95c88a97b94d06.png

The posterior predictive sales look quite reasonable. We are explaining a good amount of the variance in the data and the trends are well captured.

We can look into the errors of the model:

mmm.plot_errors(original_scale=True);
../../_images/88e887359cd31240c4330eb98b56577ce856ed3ae1bd158ec7fd9657027f3d42.png

We do not detect any obvious patterns in the errors.

Media Deep Dive#

In this section we perform a deep dive into the media channels insights.

Channel Contributions#

Now that we are happy with the global model fit, we can look into the media and individual channels contributions.

We start by looking into the aggregated contributions.

fig = mmm.plot_components_contributions(original_scale=True)
fig.gca().set(xlabel="date", ylabel="sales");
../../_images/4a1d25c6f29dcf656af32ec432ba85e80262c1077052d81f5dc35b7c3d4b1b47.png

Next, we look into deeper into the components:

mmm.plot_waterfall_components_decomposition(figsize=(18, 10));
../../_images/bd3f2c3f994d06b9278db1a01fed59dc65ff986913e232864eabe96d7ed8cdb2.png

Here are some interesting observations about the model:

  • The intercept accounts for approximately \(60\%\) of the contribution. This is what people usually call the “baseline”.

  • The top media contribution channels are Online Display, Direct Mail and TV. In some sense we see how the model is averaging the spend share (prior) and the predictive power of the media channels (posterior) .

Saturation Curves#

We continue our analysis by looking into the direct response curves. These will be the main input for the budget optimization step below.

fig = mmm.plot_direct_contribution_curves()
[ax.set(xlabel="spend") for ax in fig.axes];
../../_images/267139fc7e266a5425fb1ce1bc0f232c137daf40628d95d70150b1a321bc5e1f.png

Tip

These curves by themselves can be a great input for marketing planning! One could use them to set the budget for the next periods and manually optimize the spend for these channels depending on the saturation points. This process can be a good starting point with engage with the team. That being said, we can do better with custom optimization rutines as described in the following sections.

Beside the direct response curves, we can look intro global contribution plots simulations. The idea is to vary the spend by globally multiplying the time series spend by a value \(\delta > 0\) and seeing the expected contribution. PyMC-Marketing also shows the \(94\%\) Highest Density Interval (HDI) of the simulations results.

mmm.plot_channel_contributions_grid(start=0, stop=1.5, num=12, figsize=(15, 7));
../../_images/b77a454b47b8dd4c6ccef6716a9b81b07af7d1d2277e08170ad1a3171a59667a.png

We can clearly see the saturation effects for the different channels.

A complementary plot is to plot these same simulations against the actual spend:

mmm.plot_channel_contributions_grid(
    start=0, stop=1.5, num=12, absolute_xrange=True, figsize=(15, 7)
);
../../_images/49136ac356e2994ccf3a202baa095c7a0b03563928e10e6ecf61eac329f5f42e.png

The vertical lines show the spend from the observed data (i.e. \(\delta = 1 \) above). This plot clearly shows that, for example, Direct Mail is won’t have much incremental contribution if we keep increasing the spend. On the other hand, channels like Online Display and Insert seem to be good candidates to increase the spend. These are the concrete actionable insights you can extract from these typo of models to optimize your media spend mix 🚀!

Return on Ads Spend (ROAS)#

To have a more quantitative metric of efficiency we compute the Return on Ads Spend (ROAS) over the whole training period.

To begin lets plot the contribution share for each channel.

fig = mmm.plot_channel_contribution_share_hdi(figsize=(10, 6))
fig.suptitle("Channel Contribution Share", fontsize=18, fontweight="bold");
../../_images/13df067189e4712dcdd7e25d24780c2355a5e2d6f78d0c4d9bb93d3e33a4a4f3.png

These contributions are of course influences by the spend levels. If we want to get a better picture of the efficiency of each channel we need to look into the Return on Ads Spend (ROAS). This is done by computing the ratio of the mean contribution and the spend for each channel.

# Get the channel contributions on the original scale
channel_contribution_original_scale = mmm.compute_channel_contribution_original_scale()

# Sum the contributions over the whole training period
spend_sum = X_train[channel_columns].sum().to_numpy()
spend_sum = spend_sum[
    np.newaxis, np.newaxis, :
]  # We need these extra dimensions to broadcast the spend sum to the shape of the contributions.

# Compute the ROAS as the ratio of the mean contribution and the spend for each channel
roas_samples = channel_contribution_original_scale.sum(dim="date") / spend_sum

# Plot the ROAS
fig, ax = plt.subplots(figsize=(10, 6))
az.plot_forest(roas_samples, combined=True, ax=ax)
fig.suptitle("Return on Ads Spend (ROAS)", fontsize=18, fontweight="bold");
../../_images/b142a0f7ef9a7a3b794ab668355256f466ef9c666df737da9b66ca74ced6b2f2.png

Here we see that the channels with the highest ROAS on average are Online Display and Insert whereas Direct Mail has the lowest ROAS. This is consistent with the previous findings analyzing the direct and global response curves.

Tip

The large HDI intervals for the ROAS for the top channels come from the fact that the spend share of them during the training period is relatively small. Event though we use the spend share as priors, the overall spend levels are also reflected in the estimate (see for example how small the HDI is for Direct Mail).

We can see this as an opportunity! Based on this analysis we see that Online Display and Insert are the best channels to invest in. Hence, if we optimize our budget accordingly (more on the budget optimization below) we can improve our marketing mix and at the same time create more variance in the training data so that we get refined ROAS estimates for future iterations 😎.

It is also useful to compare the ROAS and the contribution share.

# Get the contribution share samples
share_samples = mmm.get_channel_contributions_share_samples()

fig, ax = plt.subplots(figsize=(12, 9))

for i, channel in enumerate(channel_columns):
    # Contribution share mean and hdi
    share_mean = share_samples.sel(channel=channel).mean().to_numpy()
    share_hdi = az.hdi(share_samples.sel(channel=channel))["x"].to_numpy()

    # ROAS mean and hdi
    roas_mean = roas_samples.sel(channel=channel).mean().to_numpy()
    roas_hdi = az.hdi(roas_samples.sel(channel=channel))["x"].to_numpy()

    # Plot the contribution share hdi
    ax.vlines(share_mean, roas_hdi[0], roas_hdi[1], color=f"C{i}", alpha=0.8)

    # Plot the ROAS hdi
    ax.hlines(roas_mean, share_hdi[0], share_hdi[1], color=f"C{i}", alpha=0.8)

    # Plot the means
    ax.scatter(
        share_mean,
        roas_mean,
        # Size of the scatter points is proportional to the spend share
        s=5 * (spend_shares[i] * 100),
        color=f"C{i}",
        edgecolor="black",
        label=channel,
    )
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"{x:.0%}"))

ax.legend(
    bbox_to_anchor=(1.05, 1), loc="upper left", title="Channel", title_fontsize=12
)
ax.set(
    title="Channel Contribution Share vs ROAS",
    xlabel="Contribution Share",
    ylabel="ROAS",
);
../../_images/e3bd110d6ed71e84abb4c33335ffca3b00c9ec15630b80af1cfa39d4313940c0.png

Out-of-Sample Predictions#

Until now we have analyzed the model fit from the in-sample perspective. If we want to follow the model recommendations, we need to have evidence that the model is not only good in explaining the past but also good at making predictions. This is a key component to getting the buy in from the business stakeholders.

The PyMC-Marketing MMM API allows to easily generate out-of-sample predictions from a fitted model as follows:

y_pred_test = mmm.sample_posterior_predictive(
    X_test,
    include_last_observations=True,
    original_scale=True,
    var_names=["y", "channel_contributions"],
    extend_idata=False,
    progressbar=False,
    random_seed=rng,
)

We can visualize and compare the in sample and out of sample predictions.

fig, ax = plt.subplots()
mmm.plot_posterior_predictive(original_scale=True, ax=ax)

test_hdi_94 = az.hdi(y_pred_test["y"].to_numpy().T, hdi_prob=0.94)
test_hdi_50 = az.hdi(y_pred_test["y"].to_numpy().T, hdi_prob=0.50)

ax.fill_between(
    X_test[date_column],
    test_hdi_94[:, 0],
    test_hdi_94[:, 1],
    color="C1",
    label=f"{0.94:.0%} HDI (test)",
    alpha=0.2,
)

ax.fill_between(
    X_test[date_column],
    test_hdi_50[:, 0],
    test_hdi_50[:, 1],
    color="C1",
    label=f"{0.50:.0%} HDI (test)",
    alpha=0.4,
)

ax.plot(X_test[date_column], y_test, color="black")
ax.plot(
    X_test[date_column],
    y_pred_test["y"].mean(dim="sample"),
    color="C1",
    linewidth=3,
    label="Posterior Predictive Mean",
)
ax.axvline(X_test[date_column].iloc[0], color="C2", linestyle="--")
ax.legend(loc="upper center", bbox_to_anchor=(0.5, -0.1), ncol=4)
ax.set(ylabel="sales");
../../_images/9f25ab23d3c2143797a94e43d64cd70fc11b5ab961494c280114e1c3d2c62511.png

Overall, both the in-sample and out-of-sample predictions look very reasonable.

Let’s zoom in the out-of-sample predictions:

fig, ax = plt.subplots()
ax.fill_between(
    X_test[date_column],
    test_hdi_94[:, 0],
    test_hdi_94[:, 1],
    color="C1",
    label=f"{0.94:.0%} HDI (test)",
    alpha=0.2,
)
ax.fill_between(
    X_test[date_column],
    test_hdi_50[:, 0],
    test_hdi_50[:, 1],
    color="C1",
    label=f"{0.50:.0%} HDI (test)",
    alpha=0.4,
)
ax.plot(X_test[date_column], y_test, marker="o", color="black", label="Observed")
ax.plot(
    X_test[date_column],
    y_pred_test["y"].mean(dim="sample"),
    marker="o",
    markersize=8,
    color="C1",
    linewidth=3,
    label="Posterior Predictive Mean",
)
ax.legend(loc="upper center", bbox_to_anchor=(0.5, -0.1), ncol=4)
ax.set(title="Out-of-Sample Predictions", xlabel="date", ylabel="sales");
../../_images/3c38cc93e7048d942e6b0914e546b21f5849eb7e7ac463e5279567107941fee5.png

These look actually very good!

To have a concrete score metric we can use the Continuous Ranked Probability Score (CRPS). This is a generalization of the mean absolute error to the case of probabilistic predictions. For more detail see the Time-Slice-Cross-Validation and Parameter Stability notebook.

crps_train = crps(
    y_true=y_train.to_numpy(),
    y_pred=az.extract(mmm.idata["posterior_predictive"], var_names=["y"]).T,
)

crps_test = crps(y_true=y_test.to_numpy(), y_pred=y_pred_test["y"].T)

fig, ax = plt.subplots(figsize=(10, 6))
ax.bar(["Train", "Test"], [crps_train, crps_test], color=["C0", "C1"])
ax.set(ylabel="CRPS", title="CRPS Score");
../../_images/42fcd7e0a8d6fa2497e78fe3163eb8dbfeb27cbdf73dfc9cb6a9a63251b56a29.png

Interestingly enough, the model performance is better in the test set. There could me many reasons for it:

  • The model is not capturing the end of the seasonality very well in the training set.

  • There is less variability or complexity in the test set data.

  • The test set has fewer points, which could affect the CRPS calculation comparison.

In order to have a better comparison we should look into the time slice cross validation results as in the Time-Slice-Cross-Validation and Parameter Stability notebook.

Next, we look into the out-of-sample channel contributions.

fig, axes = plt.subplots(
    nrows=len(channel_columns),
    ncols=1,
    figsize=(15, 3 * n_channels),
    sharex=True,
    sharey=True,
    layout="constrained",
)

for i, channel in enumerate(channel_columns):
    for j, hdi_prob in enumerate([0.94, 0.5]):
        ax = axes[i]
        az.plot_hdi(
            X_test[date_column],
            y_pred_test["channel_contributions"]
            .sel(channel=channel)
            .unstack()
            .transpose(..., "date"),
            hdi_prob=hdi_prob,
            color=f"C{i}",
            smooth=False,
            fill_kwargs={"alpha": 0.2 + 0.2 * j, "label": f"{hdi_prob:.0%} HDI (test)"},
            ax=ax,
        )
    ax.legend(loc="upper right")
    ax.set(title=channel, ylabel="sales")

fig.suptitle("Channel Contributions - Out-of-Sample Predictions", fontsize=16);
../../_images/01effc83a5c886a57c4528c0e04cbeba20d17eaeff4880f11ac336a8aed31653.png

We do not have the true values contributions to compare against because this is what we are trying to estimate on the first place. Nevertheless, we can still do the time slice cross validation at the channel level to analyze the stability of the out-of-sample predictions and media parameters (adstock and saturation).

Budget Optimization#

In this las section we explain how to use PyMC-Marketing to perform optimization of the media budget allocation. The main idea is to use the response curves to optimize the spend. For more details see Budget Allocation with PyMC-Marketing notebook.

For this example, let us assume we want to re-allocate the budget of the tests set. The PyMC-Marketing MMM optimizer would assume we will equally allocate the budget to all channels over the next num_periods periods. Hence, when we specify the budget variable we are actually specifying the weekly budget (in this case). Therefore, it is important to understand the past average weekly spend per channel:

X_test[channel_columns].sum(axis=0)
Direct Mail       12107275.13
Insert             1065383.05
Newspaper           273433.16
Online Display     6498301.30
Radio              3384098.82
Social Media       4298913.23
TV                 2610543.53
dtype: float64
num_periods = X_test[date_column].shape[0]

# Total budget for the test set per channel
all_budget_per_channel = X_test[channel_columns].sum(axis=0)

# Total budget for the test set
all_budget = all_budget_per_channel.sum()

# Weekly budget for each channel
# We are assuming we do not know the total budget allocation in advance
# so this would be the naive "uniform" allocation.
per_channel_weekly_budget = all_budget / (num_periods * len(channel_columns))

Let’s plot the average weekly spend per channel for the actual test set.

fig, ax = plt.subplots()
(
    all_budget_per_channel.div(num_periods * len(channel_columns))
    .sort_index(ascending=False)
    .plot.barh(color="C0", ax=ax)
)
ax.set(title=f"Weekly Average Total Spend per Channel (Next {num_periods} periods)");
../../_images/29d685d198ab87e6490086883e0e3316a0ba6b4670a0c6677102aeb21f603a1b.png

Observe that the spend distribution on the test set is considerably different from the train set. When planning future spends, this is usually the case as the team might be exploring new investment opportunities. For example that in the test set we have on average more spend on Online Display while less spend on Insert, both with high ROAS from the training data.

average_spend_per_channel_train = X_train[channel_columns].sum(axis=0) / (
    X_train.shape[0] * len(channel_columns)
)

average_spend_per_channel_test = X_test[channel_columns].sum(axis=0) / (
    X_test.shape[0] * len(channel_columns)
)

fig, ax = plt.subplots(figsize=(10, 6))
average_spend_per_channel_train.to_frame(name="train").join(
    other=average_spend_per_channel_test.to_frame(name="test"),
    how="inner",
).sort_index(ascending=False).plot.barh(color=["black", "C0"], ax=ax)
ax.set(title="Weekly Average Total Spend per Channel", xlabel="spend");
../../_images/bfa37ef5d48b88bee4b5332883c82bdf4d291e38bc954dd72bdb7ebebd8c03f5.png

Next, let’s talk about the variability of the allocation. Even if the model finds Insert as the most efficient channel, strategically is very hard to simply put \(100\%\) of the budget into this channel. A better strategy, which generally plays well with the stakeholders, is to allow a custom degree of flexibility and some upper and lower bounds for each channel budget.

For example, we will allow a \(50\%\) difference on the past average weekly spend for each channel.

percentage_change = 0.5

mean_spend_per_period_test = (
    X_test[channel_columns].sum(axis=0).div(num_periods * len(channel_columns))
)

budget_bounds = {
    key: [(1 - percentage_change) * value, (1 + percentage_change) * value]
    for key, value in (mean_spend_per_period_test).to_dict().items()
}

Now we are ready to pass all this business requirements to the optimizer.

model_granularity = "weekly"
allocation_strategy, optimization_result = mmm.optimize_budget(
    budget=per_channel_weekly_budget,
    num_periods=num_periods,
    budget_bounds=budget_bounds,
    minimize_kwargs={
        "method": "SLSQP",
        "options": {"ftol": 1e-9, "maxiter": 5_000},
    },
)

response = mmm.sample_response_distribution(
    allocation_strategy=allocation_strategy,
    time_granularity=model_granularity,
    num_periods=num_periods,
    noise_level=0.05,
)
/Users/juanitorduz/Documents/pymc-marketing/pymc_marketing/mmm/budget_optimizer.py:214: UserWarning: Using default equality constraint
  self.set_constraints(

Let’s visualize the results.

fig, ax = plt.subplots()

(
    pd.Series(allocation_strategy, index=mean_spend_per_period_test.index)
    .to_frame(name="optimized_allocation")
    .join(mean_spend_per_period_test.to_frame(name="initial_allocation"))
    .sort_index(ascending=False)
    .plot.barh(figsize=(12, 8), color=["C1", "C0"], ax=ax)
)
ax.set(title="Budget Allocation Comparison", xlabel="allocation");
../../_images/b7a171f1d58bea1f2b3de99b96902d0ada3c8509b46752e3366754a3ae086044.png

We verify that the sum of the budget allocation is the same as the initial budget.

np.testing.assert_allclose(
    mean_spend_per_period_test.sum(), allocation_strategy.sum().item()
)

The first thing we notice is how the spend on Direct Mail has been reduced. This is consistent with the previous findings that this channel is not very efficient. The spends has been distributed more evenly across the other channels. Interestingly, we see how the spend on Online Display is slightly reduced. This might be explained by the fact that this channel already had an average spend increase from the training to the test set.

Next, we can look into the expected channel contributions with this new budget allocation over time.

fig = mmm.plot_allocated_contribution_by_channel(samples=response)
fig.suptitle("Estimated Contribution per channel over time", fontsize=16);
../../_images/0623cc58e95e99e948deb288d02274176110b3b2d87d7d85557fe430dc8011b9.png

Note

Observe that uncertainty increases given the new level of spending. For example, historically spending for Online Display was in between \(200\)K to \(250\)K and we are now recommending around \(350\)K. The fact that this spend level has not been reached before is refelected in the increased uncertainty (yellow band).

We can also obtain the average expected contributions per channel with the new budget allocation.

fig, ax = mmm.plot_budget_allocation(samples=response, figsize=(12, 8))
fig.suptitle("Response vs Spend per channel", fontsize=16);
../../_images/bc690983a85e54c73d6c2784c49253c8e78aa2c3bec141a908ed87430c1ba542.png

Lastly, we want to compare the optimized budget allocation with the initial (naive) allocation. We compare this at two levels:

  1. Channel contributions: We compute the difference between the optimized and naive channel contributions from the posterior predictive samples.

  2. Total Sales: We compute the difference between the optimized and naive total sales from the posterior predictive samples. In this case we include the intercept and seasonality, but we se the control variables to zero as in many cases we do not have them for the future.

Channel Contribution Comparison#

The first thing we need to do is to compute the naive channel contributions for the test period. We do this by simply setting these values as uniform across the test period and then using the model (trained on the training period) to compute the expected channel contributions.

X_actual_uniform = X_test.copy()

for channel in channel_columns:
    X_actual_uniform[channel] = mean_spend_per_period_test[channel]

for control in control_columns:
    X_actual_uniform[control] = 0.0

pred_test_uniform = mmm.sample_posterior_predictive(
    X_actual_uniform,
    include_last_observations=True,
    original_scale=True,
    var_names=["y", "channel_contributions"],
    extend_idata=False,
    progressbar=False,
    random_seed=rng,
)

Let’s visualize the optimized and naive channel contributions.

response_channel_contribution_original_scale = apply_sklearn_transformer_across_dim(
    data=response["channel_contributions"]
    .unstack()
    .sum(dim="channel")
    .transpose(..., "date"),
    func=mmm.get_target_transformer().inverse_transform,
    dim_name="date",
)


fig, ax = plt.subplots(figsize=(15, 10))

az.plot_hdi(
    X_test[date_column],
    response_channel_contribution_original_scale,
    hdi_prob=0.94,
    color="C1",
    fill_kwargs={"alpha": 0.2, "label": f"Optimized {0.94:.0%} HDI (test)"},
    smooth=False,
    ax=ax,
)
az.plot_hdi(
    X_test[date_column],
    response_channel_contribution_original_scale,
    hdi_prob=0.5,
    color="C1",
    fill_kwargs={"alpha": 0.4, "label": f"Optimized {0.50:.0%} HDI (test)"},
    smooth=False,
    ax=ax,
)
ax.plot(
    X_test[date_column],
    response_channel_contribution_original_scale.mean(dim=("chain", "draw")),
    marker="o",
    markersize=8,
    color="C1",
    linewidth=3,
    label="Optimized Posterior Predictive Mean",
)

az.plot_hdi(
    X_actual_uniform[date_column],
    pred_test_uniform["channel_contributions"]
    .unstack()
    .sum(dim="channel")
    .transpose(..., "date"),
    hdi_prob=0.94,
    color="C2",
    fill_kwargs={"alpha": 0.2, "label": f"Naive {0.94:.0%} HDI (test)"},
    smooth=False,
    ax=ax,
)
az.plot_hdi(
    X_actual_uniform[date_column],
    pred_test_uniform["channel_contributions"]
    .unstack()
    .sum(dim="channel")
    .transpose(..., "date"),
    hdi_prob=0.5,
    color="C2",
    fill_kwargs={"alpha": 0.4, "label": f"Naive {0.50:.0%} HDI (test)"},
    smooth=False,
    ax=ax,
)
ax.plot(
    X_actual_uniform[date_column],
    pred_test_uniform["channel_contributions"]
    .unstack()
    .sum(dim="channel")
    .transpose(..., "date")
    .mean(dim=("chain", "draw")),
    marker="o",
    markersize=8,
    color="C2",
    linewidth=3,
    label="Naive Posterior Predictive Mean",
)

ax.legend(loc="lower center", bbox_to_anchor=(0.5, -0.20), ncol=3)

ax.set(
    title="Out-of-Sample Predictions - Optimized Budget Allocation",
    xlabel="date",
    ylabel="sales",
);
../../_images/8fe460a3077fead09eb0988b75df42a5ca93ee47e27f2d820e73b3565c336841.png

We see indeed that the optimized allocation strategy is more efficient regarding the aggregated channel contributions. This is exactly what we were looking for!

We can quantify the percentage improvement of the optimized allocation by comparing the two posterior distributions aggregated over time.

fig, ax = plt.subplots()
az.plot_posterior(
    (
        response_channel_contribution_original_scale.sum(dim="date")
        - pred_test_uniform["channel_contributions"]
        .unstack()
        .sum(dim="channel")
        .transpose(..., "date")
        .sum(dim="date")
    )
    / response_channel_contribution_original_scale.sum(dim="date"),
    ref_val=0,
    ax=ax,
)
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"{x:.0%}"))
ax.set(
    title="Channel Contributions\nDifference between optimized and initial allocation"
);
../../_images/f6e8f0ec5189b2903c225ed37ac7784fa4d9611f8adcb1ebd24d6abe345fff69.png

Concretely, we see an approximately \(10\%\) increase in the aggregated channel contributions. This could result in a significant increase in monetary value!

Note

The orange line and corresponding range compares these posterior samples to the reference value of zero. Hence, we have a probability of more than \(80\%\) that the optimized allocation will perform better than the initial allocation.

Sales Comparison#

Next, we want to compare the optimized and naive total sales. The strategy is the same as in the case above. The only difference is that we add additional variables like seasonality and baseline (intercept). As mentioned above, we set the control variables to zero for the sake of comparison. This is what is done in the allocate_budget_to_maximize_response() method.

y_response_original_scale = apply_sklearn_transformer_across_dim(
    data=response["y"].unstack().transpose(..., "date"),
    func=mmm.get_target_transformer().inverse_transform,
    dim_name="date",
)

fig, ax = plt.subplots(figsize=(15, 10))

az.plot_hdi(
    X_test[date_column],
    y_response_original_scale,
    hdi_prob=0.94,
    color="C1",
    fill_kwargs={"alpha": 0.2, "label": f"Optimized {0.94:.0%} HDI (test)"},
    smooth=False,
    ax=ax,
)
az.plot_hdi(
    X_test[date_column],
    y_response_original_scale,
    hdi_prob=0.5,
    color="C1",
    fill_kwargs={"alpha": 0.4, "label": f"Optimized {0.50:.0%} HDI (test)"},
    smooth=False,
    ax=ax,
)
ax.plot(
    X_test[date_column],
    y_response_original_scale.mean(dim=("chain", "draw")),
    marker="o",
    markersize=8,
    color="C1",
    linewidth=3,
    label="Optimized Posterior Predictive Mean",
)

az.plot_hdi(
    X_actual_uniform[date_column],
    pred_test_uniform["y"].unstack().transpose(..., "date"),
    hdi_prob=0.94,
    color="C2",
    fill_kwargs={"alpha": 0.2, "label": f"Naive {0.94:.0%} HDI (test)"},
    smooth=False,
    ax=ax,
)
az.plot_hdi(
    X_actual_uniform[date_column],
    pred_test_uniform["y"].unstack().transpose(..., "date"),
    hdi_prob=0.5,
    color="C2",
    fill_kwargs={"alpha": 0.4, "label": f"Naive {0.50:.0%} HDI (test)"},
    smooth=False,
    ax=ax,
)
ax.plot(
    X_actual_uniform[date_column],
    pred_test_uniform["y"].unstack().transpose(..., "date").mean(dim=("chain", "draw")),
    marker="o",
    markersize=8,
    color="C2",
    linewidth=3,
    label="Naive Posterior Predictive Mean",
)

ax.legend(loc="lower center", bbox_to_anchor=(0.5, -0.20), ncol=3)

ax.set(
    title="Out-of-Sample Predictions - Optimized Budget Allocation",
    xlabel="date",
    ylabel="sales",
);
../../_images/3ac9d32e89d50479f19703912432ad4d7f96accc28916d1888c70a0f7bb8d5f2.png

At the level of total sales, in this case the difference is negligible as the baseline and seasonality (which are the same in both cases) explain most of the variance as the average weekly media spend in this last period is much smaller as in the training set.

fig, ax = plt.subplots()
az.plot_posterior(
    (
        y_response_original_scale.sum(dim="date")
        - pred_test_uniform["y"].unstack().sum(dim="date")
    )
    / y_response_original_scale.sum(dim="date"),
    ref_val=0,
    ax=ax,
)
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"{x:.0%}"))
ax.set(title="Sales\nDifference between optimized and naive sales", xlim=(-1, 1));
../../_images/4c86f866469303cdb6556919ffb427051546c81ef7c25c17adcde4ad73f7c4b1.png

In this case we see that when we compare the posterior samples of the uplift to the reference value of zero we do not find any difference. This should not be discouraging as we did see a \(10\%\) increase in the aggregated (over time) channel contributions. This can translate to a lot of money! There is no discrepancy on the results. What we are seeing is that the constrained optimization problem where we are just allowed, because of business requirements (which is very common for many marketing teams), to change the budget up to \(60\%\) we improved the channel contributions significantly but not at the same order of magnitude of the other model components.

Conclusion#

This case study demonstrated how to leverage PyMC-Marketing’s Media Mix Modeling capabilities to gain actionable insights into marketing effectiveness and optimize budget allocation. We walked through the entire process from data preparation and exploratory analysis to model fitting, diagnostics, and budget optimization. The model provided valuable insights into the effectiveness of different marketing channels, their saturation effects, and return on ad spend. We were able to use these insights to propose an optimized budget allocation that aims to maximize sales while respecting business constraints.

Next Steps#

  1. Stakeholder engagement: Present findings to key stakeholders, focusing on actionable insights and potential business impact. Understand their feedback and iterate.

  2. Scenario planning: Use the model to simulate various budget scenarios and market conditions to support strategic decision-making.

  3. Integration with other tools: Explore ways to integrate the MMM insights with other marketing analytics tools and processes for a more holistic view of marketing performance.

  4. Time-slice cross-validation: Implement a more robust validation strategy to ensure model stability and performance over time (see Time-Slice-Cross-Validation and Parameter Stability).

  5. Iterate and refine: This analysis represents a first iteration. Future iterations could incorporate:

    • Lift test data to better calibrate the model (see Lift Test Calibration)

    • More granular data (e.g., daily instead of weekly)

    • Additional variables like competitor actions or macroeconomic factors

  6. A/B (or Geo) testing: Design experiments to test the model’s recommendations and validate its predictions in the real world.

  7. Regular updates: Establish a process for regularly updating the model with new data to keep insights current.

  8. Explore advanced features: Investigate other PyMC-Marketing capabilities, such as incorporating time varying parameters (see and MMM with time-varying parameters (TVP) (mmm_time_varying_media_example) and custom models like hierarchical levels or more complex seasonality patterns, to further refine the model (see Custom Models with MMM components).

%load_ext watermark
%watermark -n -u -v -iv -w -p pymc_marketing,pytensor,numpyro
Last updated: Sun Jan 26 2025

Python implementation: CPython
Python version       : 3.12.8
IPython version      : 8.31.0

pymc_marketing: 0.10.0
pytensor      : 2.26.4
numpyro       : 0.16.1

pandas        : 2.2.3
matplotlib    : 3.10.0
pymc_marketing: 0.10.0
arviz         : 0.20.0
seaborn       : 0.13.2
numpy         : 1.26.4
numpyro       : 0.16.1

Watermark: 2.5.0