BetaGeoModel#

class pymc_marketing.clv.models.beta_geo.BetaGeoModel(data, model_config=None, sampler_config=None)[source]#

Beta-Geometric Negative Binomial Distribution (BG/NBD) model for a non-contractual customer population across continuous time.

First introduced by Fader, Hardie & Lee [1], with additional predictive methods and enhancements in [2],[R2639b53c1838-3]_, [4] and [5]

The BG/NBD model assumes dropout probabilities for the customer population are Beta distributed, and time between transactions follows a Gamma distribution while the customer is still active.

This model requires data to be summarized by recency, frequency, and T for each customer, using clv.utils.rfm_summary() or equivalent. Modeling assumptions require T >= recency.

Predictive methods have been adapted from the BetaGeoFitter class in the legacy lifetimes library (see CamDavidsonPilon/lifetimes).

Parameters:
dataDataFrame
DataFrame containing the following columns:
  • customer_id: Unique customer identifier

  • frequency: Number of repeat purchases

  • recency: Time between the first and the last purchase

  • T: Time between the first purchase and the end of the observation period

model_configdict, optional
Dictionary of model prior parameters:
  • alpha: Scale parameter for time between purchases; defaults to Prior("Weibull", alpha=2, beta=10)

  • r: Shape parameter for time between purchases; defaults to Prior("Weibull", alpha=2, beta=1)

  • a: Shape parameter of dropout process; defaults to phi_purchase * kappa_purchase

  • b: Shape parameter of dropout process; defaults to 1-phi_dropout * kappa_dropout

  • phi_dropout: Nested prior for a and b priors; defaults to Prior("Uniform", lower=0, upper=1)

  • kappa_dropout: Nested prior for a and b priors; defaults to Prior("Pareto", alpha=1, m=1)

  • purchase_covariates: Coefficients for purchase rate covariates; defaults to Normal(0, 3)

  • dropout_covariates: Coefficients for dropout covariates; defaults to Normal.dist(0, 3)

  • purchase_covariate_cols: List containing column names of covariates for customer purchase rates.

  • dropout_covariate_cols: List containing column names of covariates for customer dropouts.

sampler_configdict, optional

Dictionary of sampler parameters. Defaults to None.

References

[1]

Fader, P. S., Hardie, B. G., & Lee, K. L. (2005). “Counting your customers the easy way: An alternative to the Pareto/NBD model.” Marketing science, 24(2), 275-284. http://brucehardie.com/papers/018/fader_et_al_mksc_05.pdf

[2]

Fader, P. S., Hardie, B. G., & Lee, K. L. (2008). “Computing P (alive) using the BG/NBD model.” http://www.brucehardie.com/notes/021/palive_for_BGNBD.pdf.

[3]

Fader, P. S. & Hardie, B. G. (2013) “Overcoming the BG/NBD Model’s #NUM! Error Problem.” http://brucehardie.com/notes/027/bgnbd_num_error.pdf.

[4]

Fader, P. S. & Hardie, B. G. (2019) “A Step-by-Step Derivation of the BG/NBD Model.” https://www.brucehardie.com/notes/039/bgnbd_derivation__2019-11-06.pdf

[5]

Fader, Peter & G. S. Hardie, Bruce (2007). “Incorporating Time-Invariant Covariates into the Pareto/NBD and BG/NBD Models”. https://www.brucehardie.com/notes/019/time_invariant_covariates.pdf

Examples

from pymc_marketing.prior import Prior
from pymc_marketing.clv import BetaGeoModel, rfm_summary

# customer identifiers and purchase datetimes
# are all that's needed to start modeling
data = [
    [1, "2024-01-01"],
    [1, "2024-02-06"],
    [2, "2024-01-01"],
    [3, "2024-01-02"],
    [3, "2024-01-05"],
    [4, "2024-01-16"],
    [4, "2024-02-05"],
    [5, "2024-01-17"],
    [5, "2024-01-18"],
    [5, "2024-01-19"],
]
raw_data = pd.DataFrame(data, columns=["id", "date"]

# preprocess data
rfm_df = rfm_summary(raw_data,'id','date')

# model_config and sampler_configs are optional
model = BetaGeoModel(
    data=data,
    model_config={
        "r": Prior("Weibull", alpha=2, beta=1),
        "alpha": Prior("HalfFlat"),
        "a": Prior("Beta", alpha=2, beta=3),
        "b": Prior("Beta", alpha=3, beta=2),
    },
    sampler_config={
        "draws": 1000,
        "tune": 1000,
        "chains": 2,
        "cores": 2,
    },
)

# The default 'mcmc' fit_method provides informative predictions
# and reliable performance on small datasets
model.fit()
print(model.fit_summary())

# Maximum a Posteriori can quickly fit a model to large datasets,
# but will give limited insights into predictive uncertainty.
model.fit(fit_method='map')
print(model.fit_summary())

# Predict number of purchases for current customers
# over the next 10 time periods
expected_purchases = model.expected_purchases(future_t=10)

# Predict probability customers are still active
probability_alive = model.expected_probability_alive()

# Predict number of purchases for a new customer over 't' time periods
expected_purchases_new_customer = model.expected_purchases_new_customer(t=10)

Methods

BetaGeoModel.__init__(data[, model_config, ...])

Initialize model configuration and sampler configuration for the model.

BetaGeoModel.attrs_to_init_kwargs(attrs)

Convert the model configuration and sampler configuration from the attributes to keyword arguments.

BetaGeoModel.build_from_idata(idata)

Build model from the InferenceData object.

BetaGeoModel.build_model()

Build the model.

BetaGeoModel.create_fit_data(X, y)

Create the fit_data group based on the input data.

BetaGeoModel.create_idata_attrs()

Create attributes for the inference data.

BetaGeoModel.distribution_new_customer([...])

Compute posterior predictive samples of dropout, purchase rate and frequency/recency of new customers.

BetaGeoModel.distribution_new_customer_dropout([...])

Sample the Beta distribution for the population-level dropout rate.

BetaGeoModel.distribution_new_customer_purchase_rate([...])

Sample the Gamma distribution for the population-level purchase rate.

BetaGeoModel.distribution_new_customer_recency_frequency([...])

BG/NBD process representing purchases across the customer population.

BetaGeoModel.expected_probability_alive([data])

Compute the probability a customer with history frequency, recency, and T is currently active.

BetaGeoModel.expected_probability_no_purchase(t)

Compute the probability a customer with history frequency, recency, and T will have 0 purchases in the period (T, T+t].

BetaGeoModel.expected_purchases([data, future_t])

Compute the expected number of future purchases across future_t time periods given recency, frequency, and T for each customer.

BetaGeoModel.expected_purchases_new_customer([...])

Compute the expected number of purchases for a new customer across t time periods.

BetaGeoModel.fit([method, fit_method])

Infer model posterior.

BetaGeoModel.fit_summary(**kwargs)

Compute the summary of the fit result.

BetaGeoModel.graphviz(**kwargs)

Get the graphviz representation of the model.

BetaGeoModel.load(fname)

Create a ModelBuilder instance from a file.

BetaGeoModel.load_from_idata(idata)

Create a ModelBuilder instance from an InferenceData object.

BetaGeoModel.post_sample_model_transformation()

Perform transformation on the model after sampling.

BetaGeoModel.predict([X, extend_idata])

Use a model to predict on unseen data and return point prediction of all the samples.

BetaGeoModel.predict_posterior([X, ...])

Generate posterior predictive samples on unseen data.

BetaGeoModel.predict_proba([X, ...])

Alias for predict_posterior, for consistency with scikit-learn probabilistic estimators.

BetaGeoModel.sample_posterior_predictive([...])

Sample from the model's posterior predictive distribution.

BetaGeoModel.sample_prior_predictive([X, y, ...])

Sample from the model's prior predictive distribution.

BetaGeoModel.save(fname)

Save the model's inference data to a file.

BetaGeoModel.set_idata_attrs([idata])

Set attributes on an InferenceData object.

BetaGeoModel.thin_fit_result(keep_every)

Return a copy of the model with a thinned fit result.

Attributes

X

default_model_config

Default model configuration.

default_sampler_config

Default sampler configuration.

fit_result

Get the posterior fit_result.

id

Generate a unique hash value for the model.

output_var

Output variable of the model.

posterior

posterior_predictive

predictions

prior

prior_predictive

version

y