MMM#

class pymc_marketing.mmm.mmm.MMM(date_column=FieldInfo(annotation=str, required=True, description='Column name of the date variable.'), channel_columns=FieldInfo(annotation=list[str], required=True, description='Column names of the media channel variables.', metadata=[MinLen(min_length=1)]), adstock=FieldInfo(annotation=AdstockTransformation, required=True, description='Type of adstock transformation to apply.', metadata=[InstanceOf()]), saturation=FieldInfo(annotation=SaturationTransformation, required=True, description='Type of saturation transformation to apply.', metadata=[InstanceOf()]), time_varying_intercept=FieldInfo(annotation=bool, required=False, default=False, description='Whether to consider time-varying intercept.'), time_varying_media=FieldInfo(annotation=bool, required=False, default=False, description='Whether to consider time-varying media contributions.'), model_config=FieldInfo(annotation=Union[dict, NoneType], required=False, default=None, description='Model configuration.'), sampler_config=FieldInfo(annotation=Union[dict, NoneType], required=False, default=None, description='Sampler configuration.'), validate_data=FieldInfo(annotation=bool, required=False, default=True, description='Whether to validate the data before fitting to model'), control_columns=None, yearly_seasonality=None, adstock_first=FieldInfo(annotation=bool, required=False, default=True, description='Whether to apply adstock first.'), dag=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='Optional DAG provided as a string Dot format for causal identification.'), treatment_nodes=FieldInfo(annotation=Union[list[str], tuple[str], NoneType], required=False, default=None, description='Column names of the variables of interest to identify causal effects on outcome.'), outcome_node=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='Name of the outcome variable.'))[source]#

Media Mix Model class, Delayed Adstock and logistic saturation as default initialization (see [1]).

Given a time series target variable \(y_{t}\) (e.g. sales on conversions), media variables \(x_{m, t}\) (e.g. impressions, clicks or costs) and a set of control covariates \(z_{c, t}\) (e.g. holidays, special events) we consider a Bayesian linear model of the form:

\[y_{t} = \alpha + \sum_{m=1}^{M}\beta_{m}f(x_{m, t}) + \sum_{c=1}^{C}\gamma_{c}z_{c, t} + \varepsilon_{t},\]

where \(\alpha\) is the intercept, \(f\) is a media transformation function and \(\varepsilon_{t}\) is the error therm which we assume is normally distributed. The function \(f\) encodes the contribution of media on the target variable. Typically we consider two types of transformation: adstock (carry-over) and saturation effects.

Notes

Here are some important notes about the model:

1. Before fitting the model, we scale the target variable and the media channels using the maximum absolute value of each variable. This enable us to have a more stable model and better convergence. If control variables are present, we do not scale them! If needed please do it before passing the data to the model.

2. We allow to add yearly seasonality controls as Fourier modes. You can use the yearly_seasonality parameter to specify the number of Fourier modes to include.

  1. This class also allow us to calibrate the model using:

    • Custom priors for the parameters via the model_config parameter. You can also set the likelihood distribution.

    • Adding lift tests to the likelihood function via the add_lift_test_measurements method.

For details on a vanilla implementation in PyMC, see [2].

References

[1]

Jin, Yuxue, et al. “Bayesian methods for media mix modeling with carryover and shape effects.” (2017).

Examples

Here is an example of how to instantiate the model with the default configuration:

import numpy as np
import pandas as pd

from pymc_marketing.mmm import (
    GeometricAdstock,
    LogisticSaturation
    MMM,
)

data_url = "https://raw.githubusercontent.com/pymc-labs/pymc-marketing/main/data/mmm_example.csv"
data = pd.read_csv(data_url, parse_dates=["date_week"])

mmm = MMM(
    date_column="date_week",
    channel_columns=["x1", "x2"],
    adstock=GeometricAdstock(l_max=8),
    saturation=LogisticSaturation(),
    control_columns=[
        "event_1",
        "event_2",
        "t",
    ],
    yearly_seasonality=2,
)

Now we can fit the model with the data:

# Set features and target
X = data.drop("y", axis=1)
y = data["y"]

# Fit the model
idata = mmm.fit(X, y)

We can also define custom priors for the model:

import numpy as np

from pymc_marketing.mmm import (
    GeometricAdstock,
    LogisticSaturation
    MMM,
)
from pymc_marketing.prior import Prior

my_model_config = {
    "saturation_beta": Prior("LogNormal", mu=np.array([2, 1]), sigma=1),
    "likelihood": Prior("Normal", sigma=Prior("HalfNormal", sigma=2)),
}

mmm = MMM(
    date_column="date_week",
    channel_columns=["x1", "x2"],
    adstock=GeometricAdstock(l_max=8),
    saturation=LogisticSaturation(),
    control_columns=[
        "event_1",
        "event_2",
        "t",
    ],
    yearly_seasonality=2,
    model_config=my_model_config,
)

As you can see, we can configure all prior and likelihood distributions via the model_config.

The fit method accepts keyword arguments that are passed to the PyMC sampling method. For example, to change the number of samples and chains, and using a JAX implementation of NUTS we can do:

sampler_kwargs = {
    "draws": 2_000,
    "target_accept": 0.9,
    "chains": 5,
    "random_seed": 42,
}

idata = mmm.fit(X, y, nuts_sampler="numpyro", **sampler_kwargs)

Methods

MMM.__init__([date_column, channel_columns, ...])

Define the constructor method.

MMM.add_lift_test_measurements(df_lift_test)

Add lift tests to the model.

MMM.attrs_to_init_kwargs(attrs)

Convert attributes to initialization kwargs.

MMM.build_from_idata(idata)

Build model from the InferenceData object.

MMM.build_model(X, y, **kwargs)

Build a probabilistic model using PyMC for marketing mix modeling.

MMM.channel_contributions_forward_pass(...)

Evaluate the channel contribution for a given channel data and a fitted model, ie.

MMM.compute_channel_contribution_original_scale([prior])

Compute the channel contributions in the original scale of the target variable.

MMM.compute_mean_contributions_over_time([...])

Get the contributions of each channel over time.

MMM.create_fit_data(X, y)

Create the fit_data group based on the input data.

MMM.create_idata_attrs()

Create attributes for the inference data.

MMM.fit(X[, y, progressbar, random_seed])

Fit a model using the data passed as a parameter.

MMM.format_recovered_transformation_parameters([...])

Format the recovered transformation parameters for each channel.

MMM.forward_pass(x)

Transform channel input into target contributions of each channel.

MMM.get_channel_contributions_forward_pass_grid(...)

Generate a grid of scaled channel contributions for a given grid of shared values.

MMM.get_channel_contributions_share_samples([prior])

Get the share of channel contributions in the original scale of the target variable.

MMM.get_errors([original_scale])

Get model errors posterior distribution.

MMM.get_target_transformer()

Return the target transformer pipeline used for preprocessing the target variable.

MMM.get_ts_contribution_posterior(...[, ...])

Get the posterior distribution of the time series contributions of a given variable.

MMM.graphviz(**kwargs)

Get the graphviz representation of the model.

MMM.load(fname)

Create a ModelBuilder instance from a file.

MMM.load_from_idata(idata)

Create a ModelBuilder instance from an InferenceData object.

MMM.max_abs_scale_channel_data(data)

MaxAbsScaler for the channel data.

MMM.max_abs_scale_target_data(data)

MaxAbsScaler for the target data.

MMM.new_spend_contributions([spend, ...])

Return the upcoming contributions for a given spend.

MMM.optimize_budget(budget, num_periods[, ...])

Optimize the given budget based on the specified utility function over a specified time period.

MMM.plot_allocated_contribution_by_channel(samples)

Plot the allocated contribution by channel with uncertainty intervals.

MMM.plot_budget_allocation(samples[, ...])

Plot the budget allocation and channel contributions.

MMM.plot_channel_contribution_share_hdi([...])

Plot the share of channel contributions in a forest plot.

MMM.plot_channel_contributions_grid(start, ...)

Plot a grid of scaled channel contributions for a given grid of share values.

MMM.plot_channel_parameter(param_name, ...)

Plot the posterior distribution of a specific parameter for each channel.

MMM.plot_components_contributions([...])

Plot the target variable and the posterior predictive model components.

MMM.plot_direct_contribution_curves([...])

Plot the direct contribution curves for each marketing channel.

MMM.plot_errors([original_scale, ax])

Plot model errors by taking the difference between true values and predicted.

MMM.plot_grouped_contribution_breakdown_over_time([...])

Plot a time series area chart for all channel contributions.

MMM.plot_new_spend_contributions(spend_amount)

Plot the upcoming sales for a given spend amount.

MMM.plot_posterior_predictive([...])

Plot the posterior predictive distribution from the model fit.

MMM.plot_prior_predictive([original_scale, ...])

Plot the prior predictive distribution from the model fit.

MMM.plot_prior_vs_posterior(var_name[, ...])

Plot the prior vs posterior distribution for a specified variable in a 3 columngrid layout.

MMM.plot_waterfall_components_decomposition([...])

Create a waterfall plot.

MMM.post_sample_model_transformation()

Post-sample model transformation in order to store the HSGP state from fit.

MMM.predict([X, extend_idata])

Use a model to predict on unseen data and return point prediction of all the samples.

MMM.predict_posterior([X, extend_idata, ...])

Generate posterior predictive samples on unseen data.

MMM.predict_proba([X, extend_idata, combined])

Alias for predict_posterior, for consistency with scikit-learn probabilistic estimators.

MMM.preprocess(target, data)

Preprocess the provided data according to the specified target.

MMM.sample_posterior_predictive([X, ...])

Sample from the model's posterior predictive distribution.

MMM.sample_prior_predictive([X, y, samples, ...])

Sample from the model's prior predictive distribution.

MMM.sample_response_distribution(...)

Generate synthetic dataset and sample posterior predictive based on allocation.

MMM.save(fname)

Save the model's inference data to a file.

MMM.set_idata_attrs([idata])

Set attributes on an InferenceData object.

MMM.validate(target, data)

Validate the input data based on the specified target type.

MMM.validate_channel_columns(data)

Validate the channel columns.

MMM.validate_control_columns(data)

Validate the control columns.

MMM.validate_date_col(data)

Validate the date column.

MMM.validate_target(data)

Validate the target column.

Attributes

X

default_model_config

Define the default model configuration.

default_sampler_config

Default sampler configuration for the model.

fit_result

Get the posterior fit_result.

id

Generate a unique hash value for the model.

methods

Get all methods of the object.

output_var

Define target variable for the model.

posterior

posterior_predictive

predictions

preprocessing_methods

A property that provides preprocessing methods for features ("X") and the target variable ("y").

prior

prior_predictive

validation_methods

A property that provides validation methods for features ("X") and the target variable ("y").

version

y

target_transformer

channel_columns

control_columns

model

date_column