ModifiedBetaGeoModel#

class pymc_marketing.clv.models.modified_beta_geo.ModifiedBetaGeoModel(data, model_config=None, sampler_config=None)[source]#

Modified Beta-Geometric Negative Binomial Distribution (MBG/NBD) model for a non-contractual customer population across continuous time.

Based on proposed modifications to the BG/NBD model by Battislam, et al. in [1], and Wagner & Hoppe in[Rd9315d94a886-2]_, which remove the BG/NBD assumption that all non-repeat customers are still active.

The MBG/NBD model assumes dropout probabilities for the customer population are Beta distributed, and time between transactions follows a Gamma distribution while the customer is still active.

This model requires data to be summarized by recency, frequency, and T for each customer, using clv.utils.rfm_summary() or equivalent. Modeling assumptions require T >= recency.

Predictive methods have been adapted from the ModifiedBetaGeoFitter class in the legacy lifetimes library (see CamDavidsonPilon/lifetimes).

Parameters:
dataDataFrame
DataFrame containing the following columns:
  • customer_id: Unique customer identifier

  • frequency: Number of repeat purchases

  • recency: Time between the first and the last purchase

  • T: Time between the first purchase and the end of the observation period

model_configdict, optional
Dictionary of model prior parameters:
  • alpha: Scale parameter for time between purchases; defaults to Prior("HalfFlat")

  • r: Shape parameter for time between purchases; defaults to Prior("HalfFlat")

  • a: Shape parameter of dropout process; defaults to phi_purchase * kappa_purchase

  • b: Shape parameter of dropout process; defaults to 1-phi_dropout * kappa_dropout

  • phi_dropout: Nested prior for a and b priors; defaults to Prior("Uniform", lower=0, upper=1)

  • kappa_dropout: Nested prior for a and b priors; defaults to Prior("Pareto", alpha=1, m=1)

sampler_configdict, optional

Dictionary of sampler parameters. Defaults to None.

References

[1]

Batislam, E.P., M. Denizel, A. Filiztekin (2007), “Empirical validation and comparison of models for customer base analysis.” International Journal of Research in Marketing, 24 (3), 201-209. https://works.bepress.com/meltem-denizel/2/download/

Examples

from pymc_marketing.prior import Prior
from pymc_marketing.clv import ModifiedBetaGeoModel, rfm_summary

# customer identifiers and purchase datetimes
# are all that's needed to start modeling
data = [
    [1, "2024-01-01"],
    [1, "2024-02-06"],
    [2, "2024-01-01"],
    [3, "2024-01-02"],
    [3, "2024-01-05"],
    [4, "2024-01-16"],
    [4, "2024-02-05"],
    [5, "2024-01-17"],
    [5, "2024-01-18"],
    [5, "2024-01-19"],
]
raw_data = pd.DataFrame(data, columns=["id", "date"]

# preprocess data
rfm_df = rfm_summary(raw_data,'id','date')

# model_config and sampler_configs are optional
model = ModifiedBetaGeoModel(
    data=data,
    model_config={
        "r": Prior("HalfFlat"),
        "alpha": Prior("HalfFlat"),
        "a": Prior("HalfFlat"),
        "b": Prior("HalfFlat),
    },
    sampler_config={
        "draws": 1000,
        "tune": 1000,
        "chains": 2,
        "cores": 2,
    },
)

# The default 'mcmc' fit_method provides informative predictions
# and reliable performance on small datasets
model.fit()
print(model.fit_summary())

# Maximum a Posteriori can quickly fit a model to large datasets,
# but will give limited insights into predictive uncertainty.
model.fit(fit_method='map')
print(model.fit_summary())

# Predict number of purchases for current customers
# over the next 10 time periods
expected_purchases = model.expected_purchases(future_t=10)

# Predict probability customers are still active
probability_alive = model.expected_probability_alive()

# Predict number of purchases for a new customer over 't' time periods
expected_purchases_new_customer = model.expected_purchases_new_customer(t=10)

Methods

ModifiedBetaGeoModel.__init__(data[, ...])

Initialize model configuration and sampler configuration for the model.

ModifiedBetaGeoModel.attrs_to_init_kwargs(attrs)

Convert the model configuration and sampler configuration from the attributes to keyword arguments.

ModifiedBetaGeoModel.build_from_idata(idata)

Build model from the InferenceData object.

ModifiedBetaGeoModel.build_model()

Build the model.

ModifiedBetaGeoModel.create_fit_data(X, y)

Create the fit_data group based on the input data.

ModifiedBetaGeoModel.create_idata_attrs()

Create attributes for the inference data.

ModifiedBetaGeoModel.distribution_new_customer([...])

Compute posterior predictive samples of dropout, purchase rate and frequency/recency of new customers.

ModifiedBetaGeoModel.distribution_new_customer_dropout([...])

Sample the Beta distribution for the population-level dropout rate.

ModifiedBetaGeoModel.distribution_new_customer_purchase_rate([...])

Sample the Gamma distribution for the population-level purchase rate.

ModifiedBetaGeoModel.distribution_new_customer_recency_frequency([...])

BG/NBD process representing purchases across the customer population.

ModifiedBetaGeoModel.expected_num_purchases(...)

Compute the expected number of purchases for a customer.

ModifiedBetaGeoModel.expected_num_purchases_new_customer(...)

Compute the expected number of purchases for a new customer.

ModifiedBetaGeoModel.expected_probability_alive([data])

Compute the probability a customer with history frequency, recency, and T is currently active.

ModifiedBetaGeoModel.expected_probability_no_purchase(t)

Probability a customer with frequency, recency, and T will have 0 purchases in the period (T, T+t].

ModifiedBetaGeoModel.expected_purchases([...])

Compute the expected number of future purchases across future_t time periods given recency, frequency, and T for each customer.

ModifiedBetaGeoModel.expected_purchases_new_customer([...])

Compute the expected number of purchases for a new customer across t time periods.

ModifiedBetaGeoModel.fit([method, fit_method])

Infer model posterior.

ModifiedBetaGeoModel.fit_summary(**kwargs)

Compute the summary of the fit result.

ModifiedBetaGeoModel.graphviz(**kwargs)

Get the graphviz representation of the model.

ModifiedBetaGeoModel.load(fname)

Create a ModelBuilder instance from a file.

ModifiedBetaGeoModel.load_from_idata(idata)

Create a ModelBuilder instance from an InferenceData object.

ModifiedBetaGeoModel.post_sample_model_transformation()

Perform transformation on the model after sampling.

ModifiedBetaGeoModel.predict([X, extend_idata])

Use a model to predict on unseen data and return point prediction of all the samples.

ModifiedBetaGeoModel.predict_posterior([X, ...])

Generate posterior predictive samples on unseen data.

ModifiedBetaGeoModel.predict_proba([X, ...])

Alias for predict_posterior, for consistency with scikit-learn probabilistic estimators.

ModifiedBetaGeoModel.sample_posterior_predictive([...])

Sample from the model's posterior predictive distribution.

ModifiedBetaGeoModel.sample_prior_predictive([...])

Sample from the model's prior predictive distribution.

ModifiedBetaGeoModel.save(fname)

Save the model's inference data to a file.

ModifiedBetaGeoModel.set_idata_attrs([idata])

Set attributes on an InferenceData object.

ModifiedBetaGeoModel.thin_fit_result(keep_every)

Return a copy of the model with a thinned fit result.

Attributes

X

default_model_config

Default model configuration.

default_sampler_config

Default sampler configuration.

fit_result

Get the posterior fit_result.

id

Generate a unique hash value for the model.

output_var

Output variable of the model.

posterior

posterior_predictive

predictions

prior

prior_predictive

version

y