ModifiedBetaGeoModel#

class pymc_marketing.clv.models.modified_beta_geo.ModifiedBetaGeoModel(data=None, *, model_config=None, sampler_config=None)[source]#

Modified Beta-Geometric Negative Binomial Distribution (MBG/NBD) model for a non-contractual customer population across continuous time.

Based on proposed modifications to the BG/NBD model by Battislam, et al. in [1], and Wagner & Hoppe in[Rd9315d94a886-2]_, which remove the BG/NBD assumption that all non-repeat customers are still active.

The MBG/NBD model assumes dropout probabilities for the customer population are Beta distributed, and time between transactions follows a Gamma distribution while the customer is still active.

This model requires data to be summarized by recency, frequency, and T for each customer, using clv.utils.rfm_summary() or equivalent. Modeling assumptions require T >= recency.

Predictive methods have been adapted from the ModifiedBetaGeoFitter class in the legacy lifetimes library (see CamDavidsonPilon/lifetimes).

Parameters:
dataDataFrame

DataFrame containing the following columns:

  • customer_id: Unique customer identifier

  • frequency: Number of repeat purchases

  • recency: Time between the first and the last purchase

  • T: Time between the first purchase and the end of the observation period

model_configdict, optional

Dictionary of model prior parameters:

  • alpha: Scale parameter for time between purchases; defaults to Prior("HalfFlat")

  • r: Shape parameter for time between purchases; defaults to Prior("HalfFlat")

  • a: Shape parameter of dropout process; defaults to phi_purchase * kappa_purchase

  • b: Shape parameter of dropout process; defaults to (1 - phi_dropout) * kappa_dropout

  • phi_dropout: Nested prior for a and b priors; defaults to Prior("Uniform", lower=0, upper=1)

  • kappa_dropout: Nested prior for a and b priors; defaults to Prior("Pareto", alpha=1, m=1)

  • purchase_covariates: Coefficients for purchase rate covariates; defaults to Normal(0, 1)

  • dropout_covariates: Coefficients for dropout covariates; defaults to Normal.dist(0, 1)

  • purchase_covariate_cols: List containing column names of covariates for customer purchase rates.

  • dropout_covariate_cols: List containing column names of covariates for customer dropouts.

sampler_configdict, optional

Dictionary of sampler parameters. Defaults to None.

References

[1]

Batislam, E.P., M. Denizel, A. Filiztekin (2007), “Empirical validation and comparison of models for customer base analysis.” International Journal of Research in Marketing, 24 (3), 201-209. https://works.bepress.com/meltem-denizel/2/download/

Examples

from pymc_extras.prior import Prior
from pymc_marketing.clv import ModifiedBetaGeoModel, rfm_summary

# customer identifiers and purchase datetimes
# are all that's needed to start modeling
data = [
    [1, "2024-01-01"],
    [1, "2024-02-06"],
    [2, "2024-01-01"],
    [3, "2024-01-02"],
    [3, "2024-01-05"],
    [4, "2024-01-16"],
    [4, "2024-02-05"],
    [5, "2024-01-17"],
    [5, "2024-01-18"],
    [5, "2024-01-19"],
]
raw_data = pd.DataFrame(data, columns=["id", "date"])

# preprocess data
rfm_df = rfm_summary(raw_data, "id", "date")

# model_config and sampler_configs are optional
model = ModifiedBetaGeoModel(
    model_config={
        "r": Prior("HalfFlat"),
        "alpha": Prior("HalfFlat"),
        "a": Prior("HalfFlat"),
        "b": Prior("HalfFlat"),
    },
    sampler_config={
        "draws": 1000,
        "tune": 1000,
        "chains": 2,
        "cores": 2,
    },
)

# The default 'mcmc' fit_method provides informative predictions
# and reliable performance on small datasets
model.fit(data=rfm_df)
print(model.fit_summary())

# Maximum a Posteriori can quickly fit a model to large datasets,
# but will give limited insights into predictive uncertainty.
model.fit(data=rfm_df, fit_method="map")
print(model.fit_summary())

# Predict number of purchases for current customers
# over the next 10 time periods
expected_purchases = model.expected_purchases(future_t=10)

# Predict probability customers are still active
probability_alive = model.expected_probability_alive()

# Predict number of purchases for a new customer over 't' time periods
expected_purchases_new_customer = model.expected_purchases_new_customer(t=10)

Methods

ModifiedBetaGeoModel.__init__([data, ...])

Initialize model configuration and sampler configuration for the model.

ModifiedBetaGeoModel.attrs_to_init_kwargs(attrs)

Convert the model configuration and sampler configuration from the attributes to keyword arguments.

ModifiedBetaGeoModel.build_from_idata(idata)

Build the model from the InferenceData object.

ModifiedBetaGeoModel.build_model([data])

Build the model.

ModifiedBetaGeoModel.create_idata_attrs()

Create attributes for the inference data.

ModifiedBetaGeoModel.distribution_new_customer([...])

Compute posterior predictive samples of dropout, purchase rate and frequency/recency of new customers.

ModifiedBetaGeoModel.distribution_new_customer_dropout([...])

Sample the Beta distribution for the population-level dropout rate.

ModifiedBetaGeoModel.distribution_new_customer_purchase_rate([...])

Sample the Gamma distribution for the population-level purchase rate.

ModifiedBetaGeoModel.distribution_new_customer_recency_frequency([...])

BG/NBD process representing purchases across the customer population.

ModifiedBetaGeoModel.expected_probability_alive([data])

Compute the probability a customer with history frequency, recency, and T is currently active.

ModifiedBetaGeoModel.expected_probability_no_purchase(t)

Probability a customer with frequency, recency, and T will have 0 purchases in the period (T, T+t].

ModifiedBetaGeoModel.expected_purchases([...])

Compute the expected number of future purchases across future_t time periods given recency, frequency, and T for each customer.

ModifiedBetaGeoModel.expected_purchases_new_customer([...])

Compute the expected number of purchases for a new customer across t time periods.

ModifiedBetaGeoModel.fit([data, method, ...])

Infer model posterior.

ModifiedBetaGeoModel.fit_summary(**kwargs)

Compute the summary of the fit result.

ModifiedBetaGeoModel.graphviz(**kwargs)

Get the graphviz representation of the model.

ModifiedBetaGeoModel.idata_to_init_kwargs(idata)

Create the initialization kwargs from an InferenceData object.

ModifiedBetaGeoModel.load(fname[, check])

Create a ModelBuilder instance from a file.

ModifiedBetaGeoModel.load_from_idata(idata)

Create a ModelBuilder instance from an InferenceData object.

ModifiedBetaGeoModel.save(fname, **kwargs)

Save the model's inference data to a file.

ModifiedBetaGeoModel.set_idata_attrs([idata])

Set attributes on an InferenceData object.

ModifiedBetaGeoModel.table(**model_table_kwargs)

Get the summary table of the model.

ModifiedBetaGeoModel.thin_fit_result(keep_every)

Return a copy of the model with a thinned fit result.

Attributes

covariate_cols

All covariate column names.

default_model_config

Default model configuration.

default_sampler_config

Default sampler configuration.

dropout_covariate_cols

Dropout covariate column names from model_config.

fit_result

Get the posterior fit_result.

id

Generate a unique hash value for the model.

posterior

Access the 'posterior' attribute of the InferenceData object.

posterior_predictive

Access the 'posterior_predictive' attribute of the InferenceData object.

predictions

Access the 'predictions' attribute of the InferenceData object.

prior

Access the 'prior' attribute of the InferenceData object.

prior_predictive

Access the 'prior_predictive' attribute of the InferenceData object.

purchase_covariate_cols

Purchase covariate column names from model_config.

version

idata

sampler_config

model_config