BetaGeoModel#
- class pymc_marketing.clv.models.beta_geo.BetaGeoModel(data=None, *, model_config=None, sampler_config=None)[source]#
Beta-Geometric Negative Binomial Distribution (BG/NBD) model for a non-contractual customer population across continuous time.
First introduced by Fader, Hardie & Lee [1], with additional predictive methods and enhancements in [2],[R2639b53c1838-3]_, [4] and [5]
The BG/NBD model assumes dropout probabilities for the customer population are Beta distributed, and time between transactions follows a Gamma distribution while the customer is still active.
This model requires data to be summarized by recency, frequency, and T for each customer, using
clv.utils.rfm_summary()or equivalent. Modeling assumptions require T >= recency.Predictive methods have been adapted from the BetaGeoFitter class in the legacy
lifetimeslibrary (see CamDavidsonPilon/lifetimes).- Parameters:
- data
DataFrame DataFrame containing the following columns:
customer_id: Unique customer identifierfrequency: Number of repeat purchasesrecency: Time between the first and the last purchaseT: Time between the first purchase and the end of the observation period
- model_config
dict, optional Dictionary of model prior parameters:
alpha: Scale parameter for time between purchases; defaults toPrior("Weibull", alpha=2, beta=10)r: Shape parameter for time between purchases; defaults toPrior("Weibull", alpha=2, beta=1)a: Shape parameter of dropout process; defaults tophi_purchase * kappa_purchaseb: Shape parameter of dropout process; defaults to(1 - phi_dropout) * kappa_dropoutphi_dropout: Nested prior for a and b priors; defaults toPrior("Uniform", lower=0, upper=1)kappa_dropout: Nested prior for a and b priors; defaults toPrior("Pareto", alpha=1, m=1)purchase_covariates: Coefficients for purchase rate covariates; defaults toNormal(0, 1)dropout_covariates: Coefficients for dropout covariates; defaults toNormal.dist(0, 1)purchase_covariate_cols: List containing column names of covariates for customer purchase rates.dropout_covariate_cols: List containing column names of covariates for customer dropouts.
- sampler_config
dict, optional Dictionary of sampler parameters. Defaults to None.
- data
References
[1]Fader, P. S., Hardie, B. G., & Lee, K. L. (2005). “Counting your customers the easy way: An alternative to the Pareto/NBD model.” Marketing science, 24(2), 275-284. http://brucehardie.com/papers/018/fader_et_al_mksc_05.pdf
[2]Fader, P. S., Hardie, B. G., & Lee, K. L. (2008). “Computing P (alive) using the BG/NBD model.” http://www.brucehardie.com/notes/021/palive_for_BGNBD.pdf.
[3]Fader, P. S. & Hardie, B. G. (2013) “Overcoming the BG/NBD Model’s #NUM! Error Problem.” http://brucehardie.com/notes/027/bgnbd_num_error.pdf.
[4]Fader, P. S. & Hardie, B. G. (2019) “A Step-by-Step Derivation of the BG/NBD Model.” https://www.brucehardie.com/notes/039/bgnbd_derivation__2019-11-06.pdf
[5]Fader, Peter & G. S. Hardie, Bruce (2007). “Incorporating Time-Invariant Covariates into the Pareto/NBD and BG/NBD Models”. https://www.brucehardie.com/notes/019/time_invariant_covariates.pdf
Examples
from pymc_extras.prior import Prior from pymc_marketing.clv import BetaGeoModel, rfm_summary # customer identifiers and purchase datetimes # are all that's needed to start modeling data = [ [1, "2024-01-01"], [1, "2024-02-06"], [2, "2024-01-01"], [3, "2024-01-02"], [3, "2024-01-05"], [4, "2024-01-16"], [4, "2024-02-05"], [5, "2024-01-17"], [5, "2024-01-18"], [5, "2024-01-19"], ] raw_data = pd.DataFrame(data, columns=["id", "date"] # preprocess data rfm_df = rfm_summary(raw_data,'id','date') # model_config and sampler_configs are optional model = BetaGeoModel( model_config={ "r": Prior("Weibull", alpha=2, beta=1), "alpha": Prior("HalfFlat"), "a": Prior("Beta", alpha=2, beta=3), "b": Prior("Beta", alpha=3, beta=2), }, sampler_config={ "draws": 1000, "tune": 1000, "chains": 2, "cores": 2, }, ) # The default 'mcmc' fit_method provides informative predictions # and reliable performance on small datasets model.fit(data=rfm_df) print(model.fit_summary()) # Maximum a Posteriori can quickly fit a model to large datasets, # but will give limited insights into predictive uncertainty. model.fit(fit_method='map') print(model.fit_summary()) # Predict number of purchases for current customers # over the next 10 time periods expected_purchases = model.expected_purchases(future_t=10) # Predict probability customers are still active probability_alive = model.expected_probability_alive() # Predict number of purchases for a new customer over 't' time periods expected_purchases_new_customer = model.expected_purchases_new_customer(t=10)
Methods
BetaGeoModel.__init__([data, model_config, ...])Initialize model configuration and sampler configuration for the model.
Convert the model configuration and sampler configuration from the attributes to keyword arguments.
Build the model from the InferenceData object.
BetaGeoModel.build_model([data])Build the model.
Create attributes for the inference data.
Compute posterior predictive samples of dropout, purchase rate and frequency/recency of new customers.
Sample the Beta distribution for the population-level dropout rate.
Sample the Gamma distribution for the population-level purchase rate.
BetaGeoModel.distribution_new_customer_recency_frequency([...])BG/NBD process representing purchases across the customer population.
Compute the probability a customer with history frequency, recency, and T is currently active.
Compute the probability a customer with history frequency, recency, and T will have 0 purchases in the period (T, T+t].
BetaGeoModel.expected_purchases([data, future_t])Compute the expected number of future purchases across future_t time periods given recency, frequency, and T for each customer.
Compute the expected number of purchases for a new customer across t time periods.
BetaGeoModel.fit([data, method, fit_method])Infer model posterior.
BetaGeoModel.fit_summary(**kwargs)Compute the summary of the fit result.
BetaGeoModel.graphviz(**kwargs)Get the graphviz representation of the model.
Create the initialization kwargs from an InferenceData object.
BetaGeoModel.load(fname[, check])Create a ModelBuilder instance from a file.
BetaGeoModel.load_from_idata(idata[, check])Create a ModelBuilder instance from an InferenceData object.
BetaGeoModel.save(fname, **kwargs)Save the model's inference data to a file.
BetaGeoModel.set_idata_attrs([idata])Set attributes on an InferenceData object.
BetaGeoModel.table(**model_table_kwargs)Get the summary table of the model.
BetaGeoModel.thin_fit_result(keep_every)Return a copy of the model with a thinned fit result.
Attributes
covariate_colsAll covariate column names.
default_model_configDefault model configuration.
default_sampler_configDefault sampler configuration.
dropout_covariate_colsDropout covariate column names from model_config.
fit_resultGet the posterior fit_result.
idGenerate a unique hash value for the model.
posteriorAccess the 'posterior' attribute of the InferenceData object.
posterior_predictiveAccess the 'posterior_predictive' attribute of the InferenceData object.
predictionsAccess the 'predictions' attribute of the InferenceData object.
priorAccess the 'prior' attribute of the InferenceData object.
prior_predictiveAccess the 'prior_predictive' attribute of the InferenceData object.
purchase_covariate_colsPurchase covariate column names from model_config.
versionidatasampler_configmodel_config