ParetoNBDModel#
- class pymc_marketing.clv.models.pareto_nbd.ParetoNBDModel(data, *, model_config=None, sampler_config=None)[source]#
Pareto Negative Binomial Model (Pareto/NBD).
Model for continuous, non-contractual customers, first introduced by Schmittlein et al. [1], with additional derivations and predictive methods by Hardie & Fader [2] [3] [4] [5].
The Pareto/NBD model assumes the time duration a customer is active follows a Gamma distribution, and time between purchases is also Gamma-distributed while the customer is still active.
This model requires data to be summarized by recency, frequency, and T for each customer, using
clv.rfm_summary()
or equivalent. Covariates impacting customer dropouts and transaction rates are optional.- Parameters:
- data
DataFrame
DataFrame containing the following columns:
customer_id
: Unique customer identifierfrequency
: Number of repeat purchasesrecency
: Time between the first and the last purchaseT
: Time between the first purchase and the end of the observation period. Model assumptions require T >= recency
Along with optional covariate columns.
- model_config
dict
, optional Dictionary containing model parameters and covariate column names:
r_prior
: Shape parameter of time between purchases; defaults toWeibull(alpha=2, beta=1)
alpha_prior
: Scale parameter of time between purchases; defaults toWeibull(alpha=2, beta=10)
s_prior
: Shape parameter of time until dropout; defaults toWeibull(alpha=2, beta=1)
beta_prior
: Scale parameter of time until dropout; defaults toWeibull(alpha=2, beta=10)
purchase_covariates_prior
: Coefficients for purchase rate covariates; defaults toNormal(0, 3)
dropout_covariates_prior
: Coefficients for dropout covariates; defaults toNormal.dist(0, 3)
purchase_covariate_cols
: List containing column names of covariates for customer purchase rates.dropout_covariate_cols
: List containing column names of covariates for customer dropouts.
If not provided, the model will use default priors specified in the
default_model_config
class attribute.- sampler_config
dict
, optional Dictionary of sampler parameters. Defaults to None.
- data
References
[1]David C. Schmittlein, Donald G. Morrison and Richard Colombo. “Counting Your Customers: Who Are They and What Will They Do Next”. Management Science,Vol. 33, No. 1 (Jan., 1987), pp. 1-24.
[2]Fader, Peter & G. S. Hardie, Bruce (2005). “A Note on Deriving the Pareto/NBD Model and Related Expressions”. http://brucehardie.com/notes/009/pareto_nbd_derivations_2005-11-05.pdf
[3]Fader, Peter & G. S. Hardie, Bruce (2014). “Additional Results for the Pareto/NBD Model”. https://www.brucehardie.com/notes/015/additional_pareto_nbd_results.pdf
[4]Fader, Peter & G. S. Hardie, Bruce (2014). “Deriving the Conditional PMF of the Pareto/NBD Model”. https://www.brucehardie.com/notes/028/pareto_nbd_conditional_pmf.pdf
[5]Fader, Peter & G. S. Hardie, Bruce (2007). “Incorporating Time-Invariant Covariates into the Pareto/NBD and BG/NBD Models”. https://www.brucehardie.com/notes/019/time_invariant_covariates.pdf
Examples
import pymc as pm from pymc_marketing.prior import Prior from pymc_marketing.clv import ParetoNBDModel, rfm_summary rfm_df = rfm_summary(raw_data,'id_col_name','date_col_name') # Initialize model with customer data; `model_config` parameter is optional model = ParetoNBDModel( data=rfm_df, model_config={ "r_prior": Prior("Weibull", alpha=2, beta=1), "alpha_prior: Prior("Weibull", alpha=2, beta=10), "s_prior": Prior("Weibull", alpha=2, beta=1), "beta_prior": Prior("Weibull", alpha=2, beta=10), }, ) # Fit model quickly to large datasets via the default Maximum a Posteriori method model.fit(fit_method='map') print(model.fit_summary()) # Use 'demz' for more informative predictions and reliable performance on smaller datasets model.fit(fit_method='demz') print(model.fit_summary()) # Predict number of purchases for customers over the next 10 time periods expected_purchases = model.expected_purchases( data=rfm_df, future_t=10, ) # Predict probability of customer making 'n' purchases over 't' time periods # Data parameter is omitted here because predictions are ran on original dataset expected_num_purchases = model.expected_purchase_probability( n=[0, 1, 2, 3], future_t=[10,20,30,40], ) new_data = pd.DataFrame( data = { "customer_id": [0, 1, 2, 3], "frequency": [5, 2, 1, 8], "recency": [7, 4, 2.5, 11], "T": [10, 8, 10, 22] } ) # Predict probability customers will still be active in 'future_t' time periods probability_alive = model.expected_probability_alive( data=new_data, future_t=[0, 3, 6, 9], ) # Predict number of purchases for a new customer over 't' time periods. expected_purchases_new_customer = model.expected_purchases_new_customer( t=[2, 5, 7, 10], )
Methods
ParetoNBDModel.__init__
(data, *[, ...])Initialize model configuration and sampler configuration for the model.
Convert the model configuration and sampler configuration from the attributes to keyword arguments.
Build model from the InferenceData object.
Build the model.
Create the fit_data group based on the input data.
Create attributes for the inference data.
Compute posterior predictive samples of dropout, purchase rate and frequency/recency of new customers.
Sample from the Gamma distribution representing dropout times for new customers.
ParetoNBDModel.distribution_new_customer_purchase_rate
([...])Sample from the Gamma distribution representing purchase rates for new customers.
ParetoNBDModel.distribution_new_customer_recency_frequency
([...])Pareto/NBD process representing purchases across the customer population.
Compute expected probability of being alive.
Compute expected probability of n_purchases over future_t time periods.
ParetoNBDModel.expected_purchases
([data, ...])Compute expected number of future purchases.
Compute the expected number of purchases for a new customer across t time periods.
ParetoNBDModel.fit
([fit_method])Infer posteriors of model parameters to run predictions.
ParetoNBDModel.fit_summary
(**kwargs)Compute the summary of the fit result.
ParetoNBDModel.graphviz
(**kwargs)Get the graphviz representation of the model.
ParetoNBDModel.load
(fname)Create a ModelBuilder instance from a file.
Create a ModelBuilder instance from an InferenceData object.
ParetoNBDModel.predict
([X, extend_idata])Use a model to predict on unseen data and return point prediction of all the samples.
ParetoNBDModel.predict_posterior
([X, ...])Generate posterior predictive samples on unseen data.
ParetoNBDModel.predict_proba
([X, ...])Alias for
predict_posterior
, for consistency with scikit-learn probabilistic estimators.Sample from the model's posterior predictive distribution.
ParetoNBDModel.sample_prior_predictive
([X, ...])Sample from the model's prior predictive distribution.
ParetoNBDModel.save
(fname)Save the model's inference data to a file.
ParetoNBDModel.set_idata_attrs
([idata])Set attributes on an InferenceData object.
ParetoNBDModel.thin_fit_result
(keep_every)Return a copy of the model with a thinned fit result.
Attributes
X
default_model_config
Default model configuration.
default_sampler_config
Default sampler configuration.
fit_result
Get the posterior fit_result.
id
Generate a unique hash value for the model.
output_var
Output variable of the model.
posterior
posterior_predictive
predictions
prior
prior_predictive
version
y