Cuaderno de ejemplo para MMM#

In this notebook we work out a simulated example to showcase the Media Mix Model (MMM) API from pymc-marketing. This package provides a pymc implementation of the MMM presented in the paper Jin, Yuxue, et al. “Bayesian methods for media mix modeling with carryover and shape effects.” (2017). We work with synthetic data as we want to do parameter recovery to better understand the model assumptions. That is, we explicitly set values for our adstock and saturation parameters (see model specification below) and recover them back from the model. The data generation process is an adaptation of the blog post «Media Effect Estimation with PyMC: Adstock, Saturation & Diminishing Returns» by Juan Orduz.

Problema de negocio#

Antes de adentrarnos en los datos, definamos primero el problema de negocio que estamos intentando resolver. Somos una agencia de marketing y queremos optimizar el presupuesto de marketing de un cliente. Tenemos acceso a los siguientes datos:

Datos de ventas: ventas semanales del cliente.
Datos de inversión en medios: gasto semanal en diferentes canales de medios (p. ej., TV, radio, online, etc.). En este ejemplo consideramos 2 canales de medios: \(x_{1}\) y \(x_{2}\).
Conocimiento del dominio:
- We know that there has been a positive sales trend which we believe comes from a strong economic growth.
- También sabemos que existe un efecto de estacionalidad anual.
- Además, se nos informó de dos valores atípicos en los datos durante las semanas 2019-05-13 y 2021-09-14.

Con esta información podemos dibujar un Gráfico Acíclico Dirigido (DAG) o modelo gráfico de cómo creemos que nuestras variables están relacionadas. En otras palabras, representar cómo creemos que nuestro sistema está causalmente relacionado.

../../_images/ff41916606db9dde5cea4dcdc2d2f9d7c2969524411b8eb6576719104f8eb0df.svg

En este ejemplo, consideraremos un sistema sencillo donde:

Marketing: Representa las acciones generadas por \(x_{1}\) y \(x_{2}\).
Eventos especiales: Valores atípicos en días específicos, posiblemente debidos a fechas especiales.
Variables exógenas: Consideraremos variables determinadas por factores externos, no determinadas en el modelo (p. ej.: crecimiento económico del país o condiciones meteorológicas que determinan un comportamiento estacional).

Understanding this ecosystem is essential to create a model that reveals the true causal signals and allows us to optimize our advertising budget. But, what do we mean by optimize the marketing budget? We want to find the optimal media mix that maximizes sales.

Ahora, dado el DAG descrito arriba, entendemos que existe una relación causal entre marketing y ventas, pero ¿cuál es la naturaleza de esa relación? En este caso, asumiremos que esta relación no es lineal; por ejemplo, un aumento del \(10\%\) en el gasto del canal \(x_{1}\) no se traduce necesariamente en un aumento del \(10\%\) en las ventas. Esto puede explicarse por dos fenómenos:

On one hand, there is a carry-over effect. Meaning, the effect of spend on sales is not instantaneous but accumulates over time.
Además, existe un efecto de saturación. Es decir, el efecto del gasto sobre las ventas no es lineal, sino que se satura en algún punto.

La ecuación implementada para describir el DAG presentado arriba será la expresada en Jin, Yuxue, et al. “Bayesian methods for media mix modeling with carryover and shape effects.” (2017), añadiendo un supuesto causal sobre los efectos de los medios y su impacto exclusivamente positivo. Concretamente, dada una variable objetivo de serie temporal \(y_{t}\) (p. ej., ventas o conversiones), variables de medios \(x_{m, t}\) (p. ej., impresiones, clics o costes) y un conjunto de covariables de control \(z_{c, t}\) (p. ej., festivos, eventos especiales), consideramos un modelo lineal de la forma

\[::\]

donde \(\alpha\) es el intercepto, \(f\) es una función de transformación de medios y \(\varepsilon_{t}\) es el término de error que asumimos normalmente distribuido. La función \(f\) codifica la contribución positiva de los medios sobre la variable objetivo. Normalmente consideramos dos tipos de transformaciones: adstock (arrastre) y efectos de saturación.

En PyMC-Marketing, ofrecemos una API para un Modelo Bayesiano de Mezcla de Medios (MMM) con varias especificaciones. En el ejemplo, implementaremos Adstock geométrico y Saturación logística como las transformaciones elegidas para nuestra Ecuación Causal Estructural previamente discutida.

Truco

El modelo MMM en pymc-marketing proporciona funciones adicionales sobre este modelo base:

Calibración con experimentos: Tenemos la opción de añadir experimentos empíricos (pruebas de lift) para calibrar el modelo usando funciones de verosimilitud personalizadas. Consulta Calibración de Prueba de Elevación.
Intercepto variable en el tiempo: Captura contribuciones base que varían en el tiempo en tu modelo (usando métodos modernos y eficientes de aproximación con procesos gaussianos). Es decir, permitimos que el término de intercepto \(\alpha = \alpha(t)\) varíe con el tiempo. Consulta mmm_tvp_example.
Budget Optimization: Allocate your marketing budget based on the parameters recovered by the model, finding the spend distribution to maximize the amount of contribution given a limited budget. See Asignación de Presupuesto con PyMC-Marketing.

Referencias:#

Parte I: Proceso de generación de datos#

En la Parte I de este cuaderno nos centramos en el proceso de generación de datos. Es decir, queremos construir la variable objetivo \(y_{t}\) (ventas) añadiendo cada uno de los componentes descritos en la sección Problema de negocio.

Preparar el cuaderno#

import warnings

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import seaborn as sns
from pymc_extras.prior import Prior

from pymc_marketing.mmm import GeometricAdstock, LogisticSaturation
from pymc_marketing.mmm.multidimensional import MMM
from pymc_marketing.mmm.transformers import geometric_adstock, logistic_saturation

warnings.filterwarnings("ignore", category=FutureWarning)

az.style.use("arviz-darkgrid")
plt.rcParams["figure.figsize"] = [12, 7]
plt.rcParams["figure.dpi"] = 100

%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = "retina"

Generar datos#

1. Date Range#

Primero definimos un rango temporal para nuestros datos. Consideramos un poco más de 2 años de datos con granularidad semanal.

seed: int = sum(map(ord, "mmm"))
rng: np.random.Generator = np.random.default_rng(seed=seed)

# date range
min_date = pd.to_datetime("2018-04-01")
max_date = pd.to_datetime("2021-09-01")

df = pd.DataFrame(
    data={"date_week": pd.date_range(start=min_date, end=max_date, freq="W-MON")}
).assign(
    year=lambda x: x["date_week"].dt.year,
    month=lambda x: x["date_week"].dt.month,
    dayofyear=lambda x: x["date_week"].dt.dayofyear,
)

n = df.shape[0]
print(f"Number of observations: {n}")

Number of observations: 179

2. Media Costs Data#

Ahora generamos datos sintéticos de dos canales \(x_1\) y \(x_2\). Nos referimos a ello como la señal bruta ya que será la entrada en la fase de modelado. Esperamos que la contribución de cada canal sea diferente, en función de los parámetros de arrastre y saturación.

Señal inicial

# media data
x1 = rng.uniform(low=0.0, high=1.0, size=n)
df["x1"] = np.where(x1 > 0.9, x1, x1 / 2)

x2 = rng.uniform(low=0.0, high=1.0, size=n)
df["x2"] = np.where(x2 > 0.8, x2, 0)


fig, ax = plt.subplots(
    nrows=2, ncols=1, figsize=(10, 7), sharex=True, sharey=True, layout="constrained"
)
sns.lineplot(x="date_week", y="x1", data=df, color="C0", ax=ax[0])
sns.lineplot(x="date_week", y="x2", data=df, color="C1", ax=ax[1])
ax[1].set(xlabel="date")
fig.suptitle("Media Costs Data", fontsize=18, fontweight="bold");

../../_images/ef697a4f9272e283a645740f3958382a3516c3fe4835d8cf45395d2b32773194.png

Remark: By design, \(x_{1}\) should resemble a typical paid social channel and \(x_{2}\) an offline (e.g. TV) spend time series.

Señal de efecto

A continuación, pasamos la señal bruta por las dos transformaciones: primero el adstock geométrico (efecto de arrastre) y luego la saturación logística. Ten en cuenta que fijamos nosotros mismos los parámetros, pero los recuperaremos desde el modelo.

Comencemos con la transformación de adstock. Fijamos el parámetro de adstock \(0 < \alpha < 1\) en \(0.4\) y \(0.2\) para \(x_1\) y \(x_2\) respectivamente. Establecemos un efecto de rezago máximo de \(8\) semanas.

# apply geometric adstock transformation
alpha1: float = 0.4
alpha2: float = 0.2

df["x1_adstock"] = geometric_adstock(
    x=df["x1"].to_xarray(), alpha=alpha1, l_max=8, normalize=True, dim="index"
).eval()

df["x2_adstock"] = geometric_adstock(
    x=df["x2"].to_xarray(), alpha=alpha2, l_max=8, normalize=True, dim="index"
).eval()

Next, we compose the resulting adstock signals with the logistic saturation function. We set the parameter \(\lambda > 0\) to be \(4\) and \(3\) for \(x_1\) and \(x_2\) respectively.

# apply saturation transformation
lam1: float = 4.0
lam2: float = 3.0

df["x1_adstock_saturated"] = logistic_saturation(
    x=df["x1_adstock"].to_xarray(), lam=lam1
).eval()

df["x2_adstock_saturated"] = logistic_saturation(
    x=df["x2_adstock"].to_xarray(), lam=lam2
).eval()

Ahora podemos visualizar la señal de efecto para cada canal después de cada transformación:

fig, ax = plt.subplots(
    nrows=3, ncols=2, figsize=(16, 9), sharex=True, sharey=False, layout="constrained"
)
sns.lineplot(x="date_week", y="x1", data=df, color="C0", ax=ax[0, 0])
sns.lineplot(x="date_week", y="x2", data=df, color="C1", ax=ax[0, 1])
sns.lineplot(x="date_week", y="x1_adstock", data=df, color="C0", ax=ax[1, 0])
sns.lineplot(x="date_week", y="x2_adstock", data=df, color="C1", ax=ax[1, 1])
sns.lineplot(x="date_week", y="x1_adstock_saturated", data=df, color="C0", ax=ax[2, 0])
sns.lineplot(x="date_week", y="x2_adstock_saturated", data=df, color="C1", ax=ax[2, 1])
fig.suptitle("Media Costs Data - Transformed", fontsize=18, fontweight="bold");

../../_images/877395cfb6a2d8af28e0e205b4a29a5b0ccc8683159733b6b9576da4ec63e778.png

3. Trend & Seasonal Components#

Ahora añadimos componentes sintéticos de tendencia y estacionalidad a la señal de efecto.

df["trend"] = (np.linspace(start=0.0, stop=50, num=n) + 10) ** (1 / 4) - 1

df["cs"] = -np.sin(2 * 2 * np.pi * df["dayofyear"] / 365.5)
df["cc"] = np.cos(1 * 2 * np.pi * df["dayofyear"] / 365.5)
df["seasonality"] = 0.5 * (df["cs"] + df["cc"])

fig, ax = plt.subplots()
sns.lineplot(x="date_week", y="trend", color="C2", label="trend", data=df, ax=ax)
sns.lineplot(
    x="date_week", y="seasonality", color="C3", label="seasonality", data=df, ax=ax
)
ax.legend(loc="upper left")
ax.set(xlabel="date", ylabel=None)
ax.set_title("Trend & Seasonality Components", fontsize=18, fontweight="bold");

../../_images/0e048137a9716b04d3015be431dc21afdb67c24854c54f3e113fd13c683a910f.png

4. Control Variables#

Añadimos dos eventos en los que hubo un pico notable en nuestra variable objetivo. Suponemos que son independientes y no estacionales (p. ej., lanzamiento de un producto concreto).

df["event_1"] = (df["date_week"] == "2019-05-13").astype(float)
df["event_2"] = (df["date_week"] == "2020-09-14").astype(float)

5. Target Variable#

Por último, definimos la variable objetivo (ventas) \(y\). Suponemos que es una combinación lineal de la señal de efecto, la tendencia y los componentes estacionales, más los dos eventos y un intercepto. También añadimos algo de ruido gaussiano.

df["intercept"] = 2.0
df["epsilon"] = rng.normal(loc=0.0, scale=0.25, size=n)

amplitude = 1
beta_1 = 3.0
beta_2 = 2.0
betas = [beta_1, beta_2]


df["y"] = amplitude * (
    df["intercept"]
    + df["trend"]
    + df["seasonality"]
    + 1.5 * df["event_1"]
    + 2.5 * df["event_2"]
    + beta_1 * df["x1_adstock_saturated"]
    + beta_2 * df["x2_adstock_saturated"]
    + df["epsilon"]
)

fig, ax = plt.subplots()
sns.lineplot(x="date_week", y="y", color="black", data=df, ax=ax)
ax.set(xlabel="date", ylabel="y (thousands)")
ax.set_title("Sales (Target Variable)", fontsize=18, fontweight="bold");

../../_images/4083a72bdbdbb599c781124db35495ddbd3e2b549e211f34234146c1174c4fb1.png

Podemos visualizar las contribuciones reales de los componentes durante el período histórico:

fig, ax = plt.subplots()

contributions = [
    df["intercept"].sum(),
    (beta_1 * df["x1_adstock_saturated"]).sum(),
    (beta_2 * df["x2_adstock_saturated"]).sum(),
    1.5 * df["event_1"].sum(),
    2.5 * df["event_2"].sum(),
    df["trend"].sum(),
    df["seasonality"].sum(),
]

ax.bar(
    ["intercept", "x1", "x2", "event_1", "event_2", "trend", "seasonality"],
    contributions,
    color=["C0" if x >= 0 else "C3" for x in contributions],
    alpha=0.8,
)
ax.bar_label(
    ax.containers[0],
    fmt="{:,.2f}",
    label_type="edge",
    padding=2,
    fontsize=15,
    fontweight="bold",
)
ax.set(ylabel="Sales (thousands)")
ax.set_title("Sales Attribution", fontsize=18, fontweight="bold");

../../_images/acdec97478cb2336306569d190c2ecbe2705ce020bc98f13173eae08143c85b3.png

Nos gustaría recuperar estos valores a partir del modelo.

6. Media Contribution Interpretation#

A partir del proceso de generación de datos podemos calcular la contribución relativa de cada canal a la variable objetivo. Recuperaremos estos valores desde el modelo.

contribution_share_x1: float = (beta_1 * df["x1_adstock_saturated"]).sum() / (
    beta_1 * df["x1_adstock_saturated"] + beta_2 * df["x2_adstock_saturated"]
).sum()

contribution_share_x2: float = (beta_2 * df["x2_adstock_saturated"]).sum() / (
    beta_1 * df["x1_adstock_saturated"] + beta_2 * df["x2_adstock_saturated"]
).sum()

print(f"Contribution Share of x1: {contribution_share_x1:.2f}")
print(f"Contribution Share of x2: {contribution_share_x2:.2f}")

Contribution Share of x1: 0.81
Contribution Share of x2: 0.19

Podemos obtener las gráficas de contribución para cada canal donde vemos claramente el efecto de las transformaciones de adstock y saturación.

fig, ax = plt.subplots(
    nrows=2, ncols=1, figsize=(12, 8), sharex=True, sharey=False, layout="constrained"
)

for i, x in enumerate(["x1", "x2"]):
    sns.scatterplot(
        x=df[x],
        y=amplitude * betas[i] * df[f"{x}_adstock_saturated"],
        color=f"C{i}",
        ax=ax[i],
    )
    ax[i].set(
        title=f"$x_{i + 1}$ contribution",
        ylabel=f"$\\beta_{i + 1} \\cdot x_{i + 1}$ adstocked & saturated",
        xlabel="x",
    )

../../_images/8d09e376c97b1fbf6efc8504a57cbe673c715fe65bb5ce419c7452c5d7baed4a.png

Esta gráfica muestra algunos aspectos interesantes de la contribución de los medios:

El efecto de adstock se refleja en la contribución distinta de cero del canal incluso cuando el gasto es cero.
One can clearly see the saturation effect as the contribution growth (slope) decreases as the spend increases.

Como veremos en la Parte II de este cuaderno, ¡recuperaremos estas gráficas desde el modelo!

Vemos que el canal \(x_{1}\) tiene una contribución mayor que \(x_{2}\). Esto podría explicarse por el hecho de que hubo más gasto en el canal \(x_{1}\) que en el canal \(x_{2}\):

fig, ax = plt.subplots(figsize=(7, 5))
df[["x1", "x2"]].sum().plot(kind="bar", color=["C0", "C1"], ax=ax)
ax.set(title="Total Media Spend", xlabel="Media Channel", ylabel="Costs (thousands)");

../../_images/fbea453347b11ff8596a332070fd2afc96ab26b1e787caae0b721465dd2fe612.png

Sin embargo, normalmente no solo interesa la contribución en sí, sino el Retorno sobre la Inversión Publicitaria (ROAS). Es decir, la contribución dividida por el coste. Podemos calcular el ROAS para cada canal de la siguiente manera:

roas_1 = (amplitude * beta_1 * df["x1_adstock_saturated"]).sum() / df["x1"].sum()
roas_2 = (amplitude * beta_2 * df["x2_adstock_saturated"]).sum() / df["x2"].sum()

fig, ax = plt.subplots(figsize=(7, 5))
(
    pd.Series(data=[roas_1, roas_2], index=["x1", "x2"]).plot(
        kind="bar", color=["C0", "C1"]
    )
)

ax.set(title="ROAS (Approximation)", xlabel="Media Channel", ylabel="ROAS");

../../_images/6713f95b0f1a245787cbfa0e5d4f0d2a5d5a20e3aeb8f3f236768d368d872119.png

Es decir, el canal \(x_{1}\) parece ser más eficiente que el canal \(x_{2}\).

Nota

We recommend reading Section 4.1 in Jin, Yuxue, et al. “Bayesian methods for media mix modeling with carryover and shape effects.” (2017) for a detailed explanation of the ROAS (and mROAS). In particular:

Si transformamos nuestra variable objetivo \(y\) (p. ej., con una transformación logarítmica), hay que tener cuidado con el cálculo del ROAS, ya que fijar el gasto en cero no conmuta con la transformación.
Hay que tener cuidado con el efecto de adstock para incluir un período de arrastre que contabilice completamente el efecto del gasto. La estimación de ROAS anterior es una aproximación.

7. Data Output#

Por supuesto, no tendremos todas estas características en nuestros datos reales. Filtraremos las características que utilizaremos para el modelado:

columns_to_keep = [
    "date_week",
    "y",
    "x1",
    "x2",
    "event_1",
    "event_2",
    "dayofyear",
]

data = df[columns_to_keep].copy()

data.head()

	date_week	y	x1	dayofyear
0	2018-04-02	3.984662	0.318580	92
1	2018-04-09	3.762872	0.112388	99
2	2018-04-16	4.466967	0.292400	106
3	2018-04-23	3.864219	0.071399	113
4	2018-04-30	4.441625	0.386745	120

Parte II: Modelado#

En esta segunda parte, nos centramos en el proceso de modelado. Usaremos los datos generados en la Parte I.

1. Feature Engineering#

Suponiendo que hicimos un EDA y comprendemos bien los datos (aquí no lo hicimos porque generamos los datos nosotros mismos, ¡pero por favor nunca te saltes el EDA!), podemos empezar a construir nuestro modelo. Una cosa que vemos inmediatamente es la estacionalidad y el componente de tendencia. Podemos generar características nosotros mismos como variables de control, por ejemplo usando una línea recta de incremento uniforme para modelar el componente de tendencia. Además, incluimos variables ficticias para codificar las contribuciones de event_1 y event_2.

For the seasonality component we use Fourier modes (similar as in Prophet). We do not need to add the Fourier modes by hand as they are handled by the model API through the yearly_seasonality argument (see below). We use 2 modes for the seasonality component.

# trend feature
data["t"] = range(n)

data.head()

	date_week	y	x1	dayofyear	t
0	2018-04-02	3.984662	0.318580	92	0
1	2018-04-09	3.762872	0.112388	99	1
2	2018-04-16	4.466967	0.292400	106	2
3	2018-04-23	3.864219	0.071399	113	3
4	2018-04-30	4.441625	0.386745	120	4

2. Model Specification#

We can specify the model structure using the MMM class. This class handles a lot of internal boilerplate code for us such as scaling the data (see details below) and handy diagnostics and reporting plots. One great feature is that we can specify the channel priors distributions ourselves, which is a fundamental component of the bayesian workflow as we can incorporate our prior knowledge into the model. This is one of the most important advantages of using a bayesian approach. Let’s see how we can do it.

Como no sabemos mucho más sobre los canales, empezamos con una heurística sencilla:

Las contribuciones de los canales deberían ser positivas, por lo que, por ejemplo, podemos usar una distribución HalfNormal como prior. Necesitamos fijar el parámetro sigma por canal. Cuanto mayor sea sigma, más «libertad» tendrá para ajustar los datos. Para especificar sigma podemos usar el siguiente punto.
We expect channels where we spend the most to have more attributed sales, before seeing the data. This is a very reasonable assumption (note that we are not imposing anything at the level of efficiency!).

¿Cómo incorporar esta heurística en el modelo? Para empezar, es importante notar que la clase MMM escala las variables objetivo y de entrada mediante un transformador MaxAbsScaler de scikit-learn; es importante especificar los priors en el espacio escalado (es decir, entre 0 y 1). Una forma de hacerlo es usar la cuota de gasto como parámetro sigma para la distribución HalfNormal. De hecho, podemos añadir un factor de escalado para tener en cuenta el soporte de la distribución.

Primero, calculemos la cuota de gasto por canal:

total_spend_per_channel = data[["x1", "x2"]].sum(axis=0)

spend_share = total_spend_per_channel / total_spend_per_channel.sum()

spend_share

x1    0.65632
x2    0.34368
dtype: float64

A continuación, especificamos el parámetro sigma por canal:

n_channels = 2

prior_sigma = n_channels * spend_share.to_numpy()

prior_sigma.tolist()

../../_images/c0decaded39c74f9a5dfb1a44ecf563829cb41ef88fb71b3568452d799e1af11.png

Delayed Saturated MMM follows sklearn convention, so we need to split our data into X (predictors) and y (target value)

X = data.drop("y", axis=1)
y = data["y"]

Puedes usar el parámetro opcional “model_config” para aplicar tus propios priors al modelo. Cada entrada en “model_config” contiene una clave que corresponde a un nombre de distribución registrado en nuestro modelo. El valor de la clave es un diccionario que describe los parámetros de entrada de esa distribución específica.

Si no estás seguro de cómo definir tus propios priors, puedes usar la propiedad “default_model_config” de MMM para ver la estructura requerida.

dummy_model = MMM(
    date_column="",
    channel_columns=[""],
    adstock=GeometricAdstock(l_max=4),
    saturation=LogisticSaturation(),
)
dummy_model.default_model_config

{'intercept': Prior("Normal", mu=0, sigma=2, dims=()),
 'likelihood': Prior("Normal", sigma=Prior("HalfNormal", sigma=2, dims=()), dims="date"),
 'gamma_control': Prior("Normal", mu=0, sigma=2, dims="control"),
 'gamma_fourier': Prior("Laplace", mu=0, b=1, dims="fourier_mode"),
 'adstock_alpha': Prior("Beta", alpha=1, beta=3, dims="channel"),
 'saturation_lam': Prior("Gamma", alpha=3, beta=1, dims="channel"),
 'saturation_beta': Prior("HalfNormal", sigma=2, dims="channel")}

Puedes cambiar únicamente los parámetros a priori que desees; no es necesario modificar todos, ¡a menos que quieras!

my_model_config = {
    "intercept": Prior("Normal", mu=0.5, sigma=0.2),
    "saturation_beta": Prior("HalfNormal", sigma=prior_sigma, dims="channel"),
    "gamma_control": Prior("Normal", mu=0, sigma=0.05, dims="control"),
    "gamma_fourier": Prior("Laplace", mu=0, b=0.2, dims="fourier_mode"),
    "likelihood": Prior("Normal", sigma=Prior("HalfNormal", sigma=6)),
}

Nota: Para la especificación de priors no hay una respuesta correcta o incorrecta. Todo depende de los datos, el contexto y los supuestos que estés dispuesto a hacer. Siempre se recomienda realizar muestreo predictivo previo y análisis de sensibilidad para comprobar el impacto de los priors en el posterior. Omitimos esto aquí por simplicidad. Si no estás seguro sobre priors específicos, la clase MMM tiene algunos priors predeterminados que puedes usar como punto de partida.

Model sampler allows specifying set of parameters that will be passed to fit the same way as the kwargs are getting passed so far. It doesn’t disable the fit kwargs, but rather extend them, to enable customizable and preservable configuration. By default the sampler_config for MMM is empty. But if you’d like to use it, you can define it like shown below:

my_sampler_config = {"progressbar": True}

Ahora estamos listos para usar la clase MMM para definir el modelo.

mmm = MMM(
    model_config=my_model_config,
    sampler_config=my_sampler_config,
    date_column="date_week",
    adstock=GeometricAdstock(l_max=8),
    saturation=LogisticSaturation(),
    channel_columns=["x1", "x2"],
    control_columns=["event_1", "event_2", "t"],
    yearly_seasonality=2,
)

# Build the model and add contribution variables in original scale
mmm.build_model(X, y)
mmm.add_original_scale_contribution_variable(
    var=[
        "channel_contribution",
        "control_contribution",
        "intercept_contribution",
        "yearly_seasonality_contribution",
        "y",
    ]
)

pm.model_to_graphviz(mmm.model)

/Users/juanitorduz/micromamba/envs/pymc-marketing-dev/lib/python3.13/site-packages/pymc_extras/prior.py:822: UserWarning: Implicit conversion of array-like parameter sigma to DataArray with dims ('channel',). Use DataArray with explicit dims to avoid this warning
  return _param_value_with_dims(param, value, dims=self.dims)

../../_images/67c310b8573f20c1491161ed0a83d5fdadda727f619fcc6b859c8f7d17bee1d5.svg

Observa cómo la clase MMM gestionó las transformaciones de medios.

Para evaluar los parámetros a priori del modelo podemos consultar la gráfica predictiva previa:

# Generate prior predictive samples
mmm.sample_prior_predictive(X, y, samples=2_000)
fig, axes = mmm.plot.prior_predictive()

Sampling: [adstock_alpha, gamma_control, gamma_fourier, intercept_contribution, saturation_beta, saturation_lam, y, y_sigma]

../../_images/4083f50f7cd948397f661c3b481c3e109a6b3bf4cac6826e1ed2b6c748253910.png

La gráfica predictiva previa muestra que los priors no son demasiado informativos.

Note that the prior predictive plot is not in the original scale. The reason is that we handle scaling of the media variables and the target variable in the model class. Scaling is important for the model to sample efficiently. We will go deeper into this topic later. For now, we can show how to reproduce the plot in the original scale:

# Custom plot for prior predictive checks
fig, ax = plt.subplots()
for i, hdi_prob in enumerate([0.94, 0.5]):
    az.plot_hdi(
        x=mmm.model.coords["date"],
        y=mmm.idata["prior"]["y_original_scale"].unstack().transpose(..., "date"),
        smooth=False,
        color="C0",
        hdi_prob=hdi_prob,
        fill_kwargs={"alpha": 0.3 + i * 0.1, "label": f"{hdi_prob:.0%} HDI"},
        ax=ax,
    )
sns.lineplot(data=df, x="date_week", y="y", color="black", label="Observed", ax=ax)
ax.legend(loc="upper left")
ax.set(xlabel="date", ylabel="y")
ax.set_title("Prior Predictive Checks", fontsize=18, fontweight="bold");

../../_images/a51ab0fa5e49fd13ff75a6d65f78451b52bbd5800d11696035cab95da4d0eb44.png

4. Model Diagnostics#

A good place to start assessing the model quality is by looking if the model had any divergences:

# Number of diverging samples
mmm.idata["sample_stats"]["diverging"].sum().item()

../../_images/702ba12f4bf2cef7092842f0bdd4dc2b8bfdc07d13bc79cc481075cd1cc7c88c.png

¡No obtuvimos ninguna! 🙌

El atributo fit_result contiene el objeto de traza pymc.

Por lo tanto, podemos usar toda la maquinaria pymc para ejecutar el diagnóstico del modelo. Primero, veamos el resumen de la traza:

az.summary(
    data=mmm.fit_result,
    var_names=[
        "adstock_alpha",
        "gamma_control",
        "gamma_fourier",
        "intercept_contribution",
        "saturation_beta",
        "saturation_lam",
        "y_sigma",
    ],
)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
adstock_alpha[x1]	0.402	0.032	0.345	0.463	0.001	0.001	2489.0	2528.0	1.0
adstock_alpha[x2]	0.187	0.040	0.113	0.266	0.001	0.001	2466.0	2742.0	1.0
gamma_control[event_1]	0.176	0.027	0.126	0.230	0.000	0.000	4024.0	3039.0	1.0
gamma_control[event_2]	0.231	0.028	0.175	0.281	0.000	0.000	3807.0	3169.0	1.0
gamma_control[t]	0.001	0.000	0.001	0.001	0.000	0.000	2916.0	2655.0	1.0
gamma_fourier[sin_1]	0.003	0.003	-0.004	0.009	0.000	0.000	3893.0	2753.0	1.0
gamma_fourier[sin_2]	-0.058	0.004	-0.064	-0.051	0.000	0.000	4266.0	3090.0	1.0
gamma_fourier[cos_1]	0.062	0.003	0.056	0.069	0.000	0.000	5107.0	2645.0	1.0
gamma_fourier[cos_2]	0.001	0.004	-0.006	0.008	0.000	0.000	3875.0	2875.0	1.0
intercept_contribution	0.355	0.013	0.331	0.381	0.000	0.000	2177.0	2481.0	1.0
saturation_beta[x1]	0.362	0.020	0.325	0.401	0.000	0.000	1994.0	2179.0	1.0
saturation_beta[x2]	0.265	0.073	0.192	0.368	0.002	0.005	1986.0	1564.0	1.0
saturation_lam[x1]	3.945	0.384	3.226	4.684	0.008	0.006	2539.0	2107.0	1.0
saturation_lam[x2]	3.175	1.188	1.210	5.405	0.026	0.029	1902.0	1675.0	1.0
y_sigma	0.031	0.002	0.028	0.035	0.000	0.000	3524.0	2912.0	1.0

Observa que los parámetros estimados para \(\alpha\) y \(\lambda\) son muy cercanos a los que establecimos en el proceso de generación de datos! Vamos a trazar la traza para los parámetros:

_ = az.plot_trace(
    data=mmm.fit_result,
    var_names=[
        "adstock_alpha",
        "gamma_control",
        "gamma_fourier",
        "intercept_contribution",
        "saturation_beta",
        "saturation_lam",
        "y_sigma",
    ],
    compact=True,
    backend_kwargs={"figsize": (12, 10), "layout": "constrained"},
)
plt.gcf().suptitle("Model Trace", fontsize=18, fontweight="bold");

../../_images/5cb04991d5108210de09e9c2c09a056f311119f911b64c0ee5e3c8214021e151.png

Overall we see a good chain mixing.

Now we sample from the posterior predictive distribution. That is, we sample from the posterior distribution to get predictions for the target variable.

We can now plot the posterior predictive distribution for the target variable. By default, the plot_posterior_predictive method will plot the mean prediction along with a \(94\%\) HDI.

fig, axes = mmm.plot.posterior_predictive(var=["y_original_scale"], hdi_prob=0.94)
sns.lineplot(
    data=df, x="date_week", y="y", color="black", label="Observed", ax=axes[0][0]
);

../../_images/3d06153e91f6dd8165256ecd7870baf71a42879b7050221629fb9c2d19cb6b88.png

El ajuste parece muy bueno (como se esperaba)!

Podemos inspeccionar los errores del modelo:

../../_images/a4d542f1ea8eb0113ebe689e3bd4c95890c4582da83dd73d40ea201d38e12551.png

We do not see any pattern in the errors, which is a good sign.

Next, we can decompose the posterior predictive distribution into the different components. We start by looking at the channel contributions:

# Component contributions (scaled space)
fig, axes = mmm.plot.contributions_over_time(
    var=["channel_contribution"], hdi_prob=0.94
)

../../_images/5d3ddfd6109feaa923447b79f6211967b3b4cad718bdc173fd0108f49d199b9b.png

We can plot the contributions in the original scale:

# Component contributions (original scale)
mmm.plot.contributions_over_time(
    var=["channel_contribution_original_scale"],
    hdi_prob=0.94,
);

../../_images/3d9c54b6794923cea4b104e1df6dc3d23fd19ecd36ae3fdd196c40e31ecfe1e7.png

Nota

The scalers attribute contains the scaling information for the target variable and the media variables. There are simple numbers (stored in a xarray.Dataset) that we can use to scale the variables back to the original scale.

mmm.scalers

<xarray.Dataset> Size: 40B
Dimensions:   (channel: 2)
Coordinates:
  * channel   (channel) object 16B 'x1' 'x2'
Data variables:
    _channel  (channel) float64 16B 0.9967 0.9944
    _target   float64 8B 8.312

Let’s check that the scaling is correct:

# Channel contributions (x1)
np.testing.assert_allclose(
    mmm.idata["posterior"]["channel_contribution"].sel(channel="x1")
    * mmm.scalers["_target"],
    mmm.idata["posterior"]["channel_contribution_original_scale"].sel(channel="x1"),
)

# Channel contributions (x2)
np.testing.assert_allclose(
    mmm.idata["posterior"]["channel_contribution"].sel(channel="x2")
    * mmm.scalers["_target"],
    mmm.idata["posterior"]["channel_contribution_original_scale"].sel(channel="x2"),
)

# Intercept contribution
np.testing.assert_allclose(
    mmm.idata["posterior"]["intercept_contribution"] * mmm.scalers["_target"],
    mmm.idata["posterior"]["intercept_contribution_original_scale"],
)

We can now plot all the contributions in the original scale:

# Component contributions (original scale)
fig, axes = mmm.plot.contributions_over_time(
    var=[
        "channel_contribution_original_scale",
        "control_contribution_original_scale",
        "intercept_contribution_original_scale",
        "yearly_seasonality_contribution_original_scale",
    ],
    dims={"channel": ["x1", "x2"]},
    hdi_prob=0.94,
)

axes = axes.flatten()

for ax in axes:
    legend = ax.get_legend()
    legend.set_bbox_to_anchor((0.5, -0.1))

../../_images/e830789032c354ef6aaa7557d4604f76299cdc381fed90913b19eb2d1147f5d9.png

We can combine these plots as:

# Component contributions (original scale)
fig, ax = mmm.plot.contributions_over_time(
    var=[
        "channel_contribution_original_scale",
        "control_contribution_original_scale",
        "intercept_contribution_original_scale",
        "yearly_seasonality_contribution_original_scale",
    ],
    dims={"channel": ["x1", "x2"]},
    combine_dims=True,
    hdi_prob=0.94,
    figsize=(12, 7),
)


legend = ax[0, 0].get_legend()
legend.set_bbox_to_anchor((0.8, -0.12))

../../_images/56d0bfad135af4def0af41bfa1cf3a1163a4d2ac9aafc7c8ba0dbf831d1d8d7e.png

The following code shows how to manually generate the aggregated channel contribution against the other components:

../../_images/587d8cfd6da263b9195a90367f942cf631434df083902425f18265ee0699d168.png

Se puede lograr una descomposición similar usando un gráfico de área:

../../_images/7a53ae700b9157f5a2c8ad750d3c73b2fc03a0274fb318631c7fa8c2db8b3d1f.png

Here the base means the sum of the intercept, control and seasonal components. Note that this only works if the contributions of the channel or control variable are strictly positive.

Next, we look into the absolute historical contributions of each component as a waterfall plot. This type of visualization is very useful to present to a non-technical audience and decision makers.

mmm.plot.waterfall_components_decomposition();

../../_images/75d7ed75266f2d043d28e3e04ae95de413f64a841a15a34a460fdf70764fe3b9.png

Note that we have recovered the true values for all the parameters! Well, in fact the contributions of the intercept and t are not exactly the same as in the data generating process, but the aggregate does match the true values of intercept + trend. The reason is that the true latent trend is not completely linear. One could use the time-varying intercept feature to capture this effect.

We can extract the mean contributions over time directly from the model:

	date	x1	x2	event_1	event_2	t	yearly_seasonality	intercept
0	2018-04-02	1.079970	0.000000	0.0	0.0	0.000000	0.021160	2.950151
1	2018-04-09	0.830757	0.000000	0.0	0.0	0.005126	0.073151	2.950151
2	2018-04-16	1.290704	0.000000	0.0	0.0	0.010251	0.118963	2.950151
3	2018-04-23	0.790082	0.000000	0.0	0.0	0.015377	0.153282	2.950151
4	2018-04-30	1.536806	0.000000	0.0	0.0	0.020502	0.171528	2.950151
...	...	...	...	...	...	...	...	...
174	2021-08-02	0.335762	0.003322	0.0	0.0	0.891853	-0.875931	2.950151
175	2021-08-09	0.710576	1.603175	0.0	0.0	0.896979	-0.886478	2.950151
176	2021-08-16	0.875334	0.407119	0.0	0.0	0.902105	-0.864161	2.950151
177	2021-08-23	1.270923	0.077905	0.0	0.0	0.907230	-0.808582	2.950151
178	2021-08-30	1.812030	0.015387	0.0	0.0	0.912356	-0.721022	2.950151

179 rows × 8 columns

5. Media Parameters#

Podemos profundizar en los parámetros de transformación de medios. Queremos comparar las distribuciones posteriores con los valores verdaderos.

fig, ax = plt.subplots(
    nrows=2,
    ncols=1,
    sharex=True,
    sharey=True,
    figsize=(12, 7),
    layout="constrained",
)
az.plot_posterior(
    mmm.idata["posterior"],
    var_names=["adstock_alpha"],
    ref_val={
        "adstock_alpha": [
            {"channel": "x1", "ref_val": alpha1},
            {"channel": "x2", "ref_val": alpha2},
        ],
    },
    ax=ax,
)

fig.suptitle("Adstock Alpha Posterior", fontsize=18, fontweight="bold");

../../_images/16ef1e060d49bb8bb9746813593f55398a770ea50a7fe2ecad5addd9dbbed490.png

fig, ax = plt.subplots(
    nrows=2,
    ncols=1,
    sharex=True,
    sharey=True,
    figsize=(12, 7),
    layout="constrained",
)
az.plot_posterior(
    mmm.idata["posterior"],
    var_names=["saturation_lam"],
    ref_val={
        "saturation_lam": [
            {"channel": "x1", "ref_val": lam1},
            {"channel": "x2", "ref_val": lam2},
        ],
    },
    ax=ax,
)

fig.suptitle("Saturation Lambda Posterior", fontsize=18, fontweight="bold");

../../_images/0cf26dcb6492489c151dee85db60dc7c4d042b3aa728cfde6d9d9f5cfc443e5c.png

We indeed see that our media parameters were successfully recovered!

6. Media Deep-Dive#

Primero podemos calcular la contribución relativa de cada canal a la variable objetivo. Ten en cuenta que recuperamos los valores verdaderos!

fig, ax = mmm.plot.channel_contribution_share_hdi(figsize=(10, 6))
ax.axvline(
    x=contribution_share_x1,
    color="C1",
    linestyle="--",
    label="true contribution share ($x_1$)",
)
ax.axvline(
    x=contribution_share_x2,
    color="C2",
    linestyle="--",
    label="true contribution share ($x_2$)",
)
ax.legend(loc="upper center", bbox_to_anchor=(0.5, -0.05), ncol=1);

../../_images/fbb48937c111490e866d361c6d4b241c08f2fa53daf44684790c70ff1061acee.png

A continuación, podemos trazar la contribución relativa de cada canal a la variable objetivo.

Primero trazamos la contribución directa por canal. De nuevo, obtenemos valores muy cercanos a los obtenidos en la Parte I.

fig, axes = mmm.plot.saturation_scatterplot(original_scale=True)
[ax.set(xlabel="x") for ax in axes.flatten()];

../../_images/cf6f9755315d3fb3be4d0cc01605fe8627175f61f66acc00896c183e5e970af1.png

Note that trying to get the delayed cumulative contribution is not that easy as contributions from the past leak into the future. Specifically, note that we apply the saturation function to the aggregation. As the saturation function is non-linear. This is not the same as taking the sum of the saturation contributions Hence, it is very hard to reverse engineer the contribution after carryover and saturation composition this way.

A more transparent alternative is to evaluate the channel contribution at different share spend levels for the complete training period. Concretely, if we denote by \(\delta\) (we call it sweep factor) the input channel data percentage level, so that for \(\delta = 1\) we have the model input spend data and for \(\delta = 1.5\) we have a \(50\%\) increase in the spend, then we can compute the channel contribution at a grid of \(\delta\)-values and plot the results:

# Run sensitivity analysis sweep
sweeps = np.linspace(0, 1.5, 12)
mmm.sensitivity.run_sweep(
    sweep_values=sweeps,
    var_input="channel_data",
    var_names="channel_contribution_original_scale",
    extend_idata=True,
)

# Plot sensitivity analysis
ax = mmm.plot.sensitivity_analysis(
    xlabel="Sweep multiplicative",
    ylabel="Total contribution over training period",
    hue_dim="channel",
    x_sweep_axis="relative",
)
ax.axvline(1.0, color="black", linestyle="--", linewidth=1);

../../_images/18fe8f7f5b1e10b4e9d9e31f441e7670342f67a46fd56f76eeb8bdc0b5811a70.png

Here the black dashed line represents the case where the spend is at the historical level.

Este gráfico tiene en cuenta el arrastre (adstock) y el efecto de saturación.
We see that when we have no spend, the contribution is zero (assuming there was no spend in the past, otherwise the carryover effect would be non-zero).

Observa que estos valores de cuadrícula sirven como entradas para un paso de optimización.

También podemos trazar la misma contribución usando el eje x como la entrada total del canal (p. ej., el gasto total en EUR).

# Plot sensitivity analysis with absolute x-axis
ax = mmm.plot.sensitivity_analysis(
    xlabel="Sweep absolute spend",
    ylabel="Total contribution over training period",
    hue_dim="channel",
    x_sweep_axis="absolute",
)

for i, channel in enumerate(["x1", "x2"]):
    ax.axvline(
        X[channel].sum(),
        color=f"C{i}",
        linestyle="--",
        label=f"historical total spend ({channel})",
    )

ax.legend(loc="upper left");

../../_images/49285e9cf9a51623e677700bc51245a7bc9e88cb23c1ee1cdc13077dfdb8afcf.png

All of these visualizations are very useful to understand the contribution of each channel to the target variable and the effect of the saturation and adstock effects. For more details on how to interpret these plots, please refer to tutorial Understanding Media Saturation in Marketing Mix Models.

7. Contribution Recovery#

A continuación, podemos trazar la contribución directa de cada canal a la variable objetivo a lo largo del tiempo.

# Component contributions (original scale)
fig, axes = mmm.plot.contributions_over_time(
    var=["channel_contribution_original_scale"],
    hdi_prob=0.94,
)

axes = axes.flatten()

for i, x in enumerate(["x1", "x2"]):
    # Estimate true contribution in the original scale from the data generating process
    sns.lineplot(
        x=df["date_week"],
        y=amplitude * betas[i] * df[f"{x}_adstock_saturated"],
        color="black",
        label=f"{x} true contribution",
        linestyle="--",
        alpha=0.5,
        ax=axes[i],
    )

[ax.legend(loc="upper left") for ax in axes]

fig.suptitle("Contribution Recovery", fontsize=18, fontweight="bold");

../../_images/dedcc1f7a19fcfc131bb85b3323fce337d3b1af3986deafa42ef015b7e3f14e4.png

The results look great! We therefore successfully recovered the true values from the data generation process. We have also seen how easy it is to use the MMM class to fit media mix models! It takes over the model specification and the media transformations, while having all the flexibility of pymc!

8. ROAS#

Finalmente, podemos calcular la distribución posterior (aproximada) de ROAS para cada canal.

roas = mmm.incrementality.contribution_over_spend(frequency="all_time").rename("roas")

fig, axes = plt.subplots(
    nrows=2, ncols=1, figsize=(12, 7), sharex=True, sharey=False, layout="constrained"
)
az.plot_posterior(roas, ref_val=[roas_1, roas_2], ax=axes)
axes[0].set(title="Channel $x_{1}$")
axes[1].set(title="Channel $x_{2}$", xlabel="ROAS")
fig.suptitle("ROAS Posterior Distributions", fontsize=18, fontweight="bold", y=1.06);

../../_images/045b7d0f89c6ad80dd5255273396ef8321df7e45350fe27161fd577e4b4b324a.png

Vemos que las distribuciones posteriores de ROAS están centradas en los valores verdaderos. También vemos que, incluso considerando la incertidumbre, el canal \(x_{1}\) es más eficiente que el canal \(x_{2}\).

It is also useful to compare the ROAS and the contribution share. In the next plot we plot these two inferred estimates per channel.

../../_images/2c13adcbfd432ad8c111ec050723af35a32246c8f9f2dce377b2188e44461b20.png

This plot is very effective at summarizing channel efficiency. In this example, it turns out that the most efficient channel \(x_1\) has a higher contribution share than the less efficient channel \(x_2\).

9. Out of Sample Predictions#

Las predicciones fuera de muestra se realizan con los métodos predict y posterior_predictive. Estos incluyen

sample_posterior_predictive : Obtener la distribución predictiva posterior completa
predict: Obtener la media de la distribución predictiva posterior

Estos métodos toman nuevos datos, X, y algunos kwargs adicionales para nuevas predicciones. Específicamente,

include_last_observations : booleano para llevar efectos de adstock de las últimas observaciones en el conjunto de datos de entrenamiento

Los nuevos datos necesitan tener todas las características que se especifican en el modelo. No hay que preocuparse por:

escalado de gastos de canales de entrada
creación de transformaciones de Fourier en la columna date_column
escalado inverso al dominio de destino

¡Esto se hará automáticamente! Sin embargo, por favor ten en cuenta que las variables de control NO se escalan automáticamente - si es necesario, debes escalarlas antes de pasar los datos al modelo.

last_date = X["date_week"].max()

# New dates starting from last in dataset
n_new = 5
new_dates = pd.date_range(start=last_date, periods=1 + n_new, freq="W-MON")[1:]

X_out_of_sample = pd.DataFrame(
    {
        "date_week": new_dates,
    }
)

# Same channel spends as last day
X_out_of_sample["x1"] = X["x1"].iloc[-1]
X_out_of_sample["x2"] = X["x2"].iloc[-1]

# Other features
X_out_of_sample["event_1"] = 0
X_out_of_sample["event_2"] = 0

X_out_of_sample["t"] = range(len(X), len(X) + n_new)

X_out_of_sample

	date_week	x1	t
0	2021-09-06	0.438857	179
1	2021-09-13	0.438857	180
2	2021-09-20	0.438857	181
3	2021-09-27	0.438857	182
4	2021-10-04	0.438857	183

Llama al método deseado para obtener las nuevas muestras! Las nuevas coordenadas serán de las nuevas fechas

X_out_of_sample.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   date_week  5 non-null      datetime64[ns]
 1   x1         5 non-null      float64       
 2   x2         5 non-null      float64       
 3   event_1    5 non-null      int64         
 4   event_2    5 non-null      int64         
 5   t          5 non-null      int64         
dtypes: datetime64[ns](1), float64(2), int64(3)
memory usage: 372.0 bytes

Nota

If the method is being called multiple times, set the extend_idata argument to False in order to not overwrite the observed_data in the InferenceData

Las nuevas predicciones se transforman de nuevo a la escala original de la variable objetivo por defecto. Esto se puede ver a continuación:

def plot_in_sample(X, y, ax, n_points: int = 15):
    sns.lineplot(
        x=X["date_week"][-n_points:],
        y=y[-n_points:],
        marker="o",
        markersize=7,
        color="black",
        label="actuals",
        ax=ax,
    )
    return ax


def plot_out_of_sample(X_out_of_sample, y_out_of_sample, ax, color, label):
    y_out_original_scale = (
        y_out_of_sample["y_original_scale"].unstack().transpose(..., "date")
    )
    az.plot_hdi(
        X_out_of_sample["date_week"].dt.to_pydatetime(),
        y_out_original_scale,
        smooth=False,
        fill_kwargs={"alpha": 0.25, "color": color},
        ax=ax,
    )

    mean = y_out_original_scale.mean(dim=("chain", "draw"))
    mean.plot(ax=ax, marker="o", markersize=7, label=label, color=color, linestyle="--")
    ax.set(ylabel="Original Target Scale")
    ax.set_title("Out of sample predictions for MMM", fontsize=18, fontweight="bold")
    return ax


_, ax = plt.subplots()
plot_in_sample(X, y, ax=ax)
plot_out_of_sample(
    X_out_of_sample, y_out_of_sample, ax=ax, label="out of sample", color="C0"
)
ax.legend(loc="upper left");

../../_images/c554e6575aed2a256f0b7ab405b63fc2c59d04cc5e83a932c3d98189d2dcc6f4.png

Si los datos fuera de muestra se extienden desde las predicciones originales, considera establecer el include_last_observations a True para llevar los efectos de los gastos del último canal en el conjunto de entrenamiento.

The predictions are higher since the channel contributions from the final spends still have an impact that eventually subside.

y_out_of_sample_with_adstock = mmm.sample_posterior_predictive(
    X_out_of_sample, extend_idata=False, include_last_observations=True
)

Sampling: [y]

_, ax = plt.subplots()
plot_in_sample(X, y, ax=ax)
plot_out_of_sample(
    X_out_of_sample, y_out_of_sample, ax=ax, label="out of sample", color="C0"
)
plot_out_of_sample(
    X_out_of_sample,
    y_out_of_sample_with_adstock,
    ax=ax,
    label="adstock out of sample",
    color="C1",
)
ax.legend();

../../_images/3f6048826f2e3311ea6407088981f619a50b6ccda22d18abaf38733f2511c4de.png

10. Save Model#

After your model is trained, you can quickly save it using the save method. For more information about model deployment see Despliegue del modelo.

mmm.save("model.nc", engine="h5netcdf")

%load_ext watermark
%watermark -n -u -v -iv -w -p pymc_marketing,pytensor

Last updated: Wed, 18 Mar 2026

Python implementation: CPython
Python version       : 3.13.12
IPython version      : 9.11.0

pymc_marketing: 0.18.2
pytensor      : 2.38.2

arviz         : 0.23.4
graphviz      : 0.21
matplotlib    : 3.10.8
numpy         : 2.4.2
pandas        : 2.3.3
pymc          : 5.28.1
pymc_extras   : 0.9.3
pymc_marketing: 0.18.2
seaborn       : 0.13.2

Watermark: 2.6.0

Cuaderno de ejemplo para MMM#

Problema de negocio#

Referencias:#

Parte I: Proceso de generación de datos#

Preparar el cuaderno#

Generar datos#

1. Date Range#

2. Media Costs Data#

3. Trend & Seasonal Components#

4. Control Variables#

5. Target Variable#

6. Media Contribution Interpretation#

7. Data Output#

Parte II: Modelado#

1. Feature Engineering#

2. Model Specification#

3. Model Fitting#

4. Model Diagnostics#

5. Media Parameters#

6. Media Deep-Dive#

7. Contribution Recovery#

8. ROAS#

9. Out of Sample Predictions#

10. Save Model#