BetaGeoModel#
- class pymc_marketing.clv.models.beta_geo.BetaGeoModel(data, model_config=None, sampler_config=None)[source]#
Beta-Geometric Negative Binomial Distribution (BG/NBD) model for a non-contractual customer population across continuous time.
First introduced by Fader, Hardie & Lee [1], with additional predictive methods and enhancements in [2],[R2639b53c1838-3]_, [4] and [5]
The BG/NBD model assumes dropout probabilities for the customer population are Beta distributed, and time between transactions follows a Gamma distribution while the customer is still active.
This model requires data to be summarized by recency, frequency, and T for each customer, using
clv.utils.rfm_summary()
or equivalent. Modeling assumptions require T >= recency.Predictive methods have been adapted from the BetaGeoFitter class in the legacy lifetimes library (see CamDavidsonPilon/lifetimes).
- Parameters:
- data
DataFrame
- DataFrame containing the following columns:
customer_id
: Unique customer identifierfrequency
: Number of repeat purchasesrecency
: Time between the first and the last purchaseT
: Time between the first purchase and the end of the observation period
- model_config
dict
, optional - Dictionary of model prior parameters:
alpha
: Scale parameter for time between purchases; defaults toPrior("Weibull", alpha=2, beta=10)
r
: Shape parameter for time between purchases; defaults toPrior("Weibull", alpha=2, beta=1)
a
: Shape parameter of dropout process; defaults tophi_purchase
*kappa_purchase
b
: Shape parameter of dropout process; defaults to1-phi_dropout
*kappa_dropout
phi_dropout
: Nested prior for a and b priors; defaults toPrior("Uniform", lower=0, upper=1)
kappa_dropout
: Nested prior for a and b priors; defaults toPrior("Pareto", alpha=1, m=1)
purchase_covariates
: Coefficients for purchase rate covariates; defaults toNormal(0, 3)
dropout_covariates
: Coefficients for dropout covariates; defaults toNormal.dist(0, 3)
purchase_covariate_cols
: List containing column names of covariates for customer purchase rates.dropout_covariate_cols
: List containing column names of covariates for customer dropouts.
- sampler_config
dict
, optional Dictionary of sampler parameters. Defaults to None.
- data
References
[1]Fader, P. S., Hardie, B. G., & Lee, K. L. (2005). “Counting your customers the easy way: An alternative to the Pareto/NBD model.” Marketing science, 24(2), 275-284. http://brucehardie.com/papers/018/fader_et_al_mksc_05.pdf
[2]Fader, P. S., Hardie, B. G., & Lee, K. L. (2008). “Computing P (alive) using the BG/NBD model.” http://www.brucehardie.com/notes/021/palive_for_BGNBD.pdf.
[3]Fader, P. S. & Hardie, B. G. (2013) “Overcoming the BG/NBD Model’s #NUM! Error Problem.” http://brucehardie.com/notes/027/bgnbd_num_error.pdf.
[4]Fader, P. S. & Hardie, B. G. (2019) “A Step-by-Step Derivation of the BG/NBD Model.” https://www.brucehardie.com/notes/039/bgnbd_derivation__2019-11-06.pdf
[5]Fader, Peter & G. S. Hardie, Bruce (2007). “Incorporating Time-Invariant Covariates into the Pareto/NBD and BG/NBD Models”. https://www.brucehardie.com/notes/019/time_invariant_covariates.pdf
Examples
from pymc_marketing.prior import Prior from pymc_marketing.clv import BetaGeoModel, rfm_summary # customer identifiers and purchase datetimes # are all that's needed to start modeling data = [ [1, "2024-01-01"], [1, "2024-02-06"], [2, "2024-01-01"], [3, "2024-01-02"], [3, "2024-01-05"], [4, "2024-01-16"], [4, "2024-02-05"], [5, "2024-01-17"], [5, "2024-01-18"], [5, "2024-01-19"], ] raw_data = pd.DataFrame(data, columns=["id", "date"] # preprocess data rfm_df = rfm_summary(raw_data,'id','date') # model_config and sampler_configs are optional model = BetaGeoModel( data=data, model_config={ "r": Prior("Weibull", alpha=2, beta=1), "alpha": Prior("HalfFlat"), "a": Prior("Beta", alpha=2, beta=3), "b": Prior("Beta", alpha=3, beta=2), }, sampler_config={ "draws": 1000, "tune": 1000, "chains": 2, "cores": 2, }, ) # The default 'mcmc' fit_method provides informative predictions # and reliable performance on small datasets model.fit() print(model.fit_summary()) # Maximum a Posteriori can quickly fit a model to large datasets, # but will give limited insights into predictive uncertainty. model.fit(fit_method='map') print(model.fit_summary()) # Predict number of purchases for current customers # over the next 10 time periods expected_purchases = model.expected_purchases(future_t=10) # Predict probability customers are still active probability_alive = model.expected_probability_alive() # Predict number of purchases for a new customer over 't' time periods expected_purchases_new_customer = model.expected_purchases_new_customer(t=10)
Methods
BetaGeoModel.__init__
(data[, model_config, ...])Initialize model configuration and sampler configuration for the model.
Convert the model configuration and sampler configuration from the attributes to keyword arguments.
Build model from the InferenceData object.
Build the model.
Create the fit_data group based on the input data.
Create attributes for the inference data.
Compute posterior predictive samples of dropout, purchase rate and frequency/recency of new customers.
Sample the Beta distribution for the population-level dropout rate.
Sample the Gamma distribution for the population-level purchase rate.
BetaGeoModel.distribution_new_customer_recency_frequency
([...])BG/NBD process representing purchases across the customer population.
Compute the probability a customer with history frequency, recency, and T is currently active.
Compute the probability a customer with history frequency, recency, and T will have 0 purchases in the period (T, T+t].
BetaGeoModel.expected_purchases
([data, future_t])Compute the expected number of future purchases across future_t time periods given recency, frequency, and T for each customer.
Compute the expected number of purchases for a new customer across t time periods.
BetaGeoModel.fit
([method, fit_method])Infer model posterior.
BetaGeoModel.fit_summary
(**kwargs)Compute the summary of the fit result.
BetaGeoModel.graphviz
(**kwargs)Get the graphviz representation of the model.
BetaGeoModel.load
(fname)Create a ModelBuilder instance from a file.
BetaGeoModel.load_from_idata
(idata)Create a ModelBuilder instance from an InferenceData object.
Perform transformation on the model after sampling.
BetaGeoModel.predict
([X, extend_idata])Use a model to predict on unseen data and return point prediction of all the samples.
BetaGeoModel.predict_posterior
([X, ...])Generate posterior predictive samples on unseen data.
BetaGeoModel.predict_proba
([X, ...])Alias for
predict_posterior
, for consistency with scikit-learn probabilistic estimators.Sample from the model's posterior predictive distribution.
BetaGeoModel.sample_prior_predictive
([X, y, ...])Sample from the model's prior predictive distribution.
BetaGeoModel.save
(fname)Save the model's inference data to a file.
BetaGeoModel.set_idata_attrs
([idata])Set attributes on an InferenceData object.
BetaGeoModel.thin_fit_result
(keep_every)Return a copy of the model with a thinned fit result.
Attributes
X
default_model_config
Default model configuration.
default_sampler_config
Default sampler configuration.
fit_result
Get the posterior fit_result.
id
Generate a unique hash value for the model.
output_var
Output variable of the model.
posterior
posterior_predictive
predictions
prior
prior_predictive
version
y