ParetoNBDModel#
- class pymc_marketing.clv.models.pareto_nbd.ParetoNBDModel(data, *, model_config=None, sampler_config=None)[source]#
Pareto Negative Binomial Model (Pareto/NBD).
Model for continuous, non-contractual customers, first introduced by Schmittlein, et al. [1], with additional derivations and predictive methods by Hardie & Fader [2] [3] [4].
The Pareto/NBD model assumes the time duration a customer is active follows a Gamma distribution, and time between purchases is also Gamma-distributed while the customer is still active.
This model requires data to be summarized by recency, frequency, and T for each customer, using
clv.rfm_summary()
or equivalent. Covariates impacting customer dropouts and transaction rates are optional.- Parameters:
data (pd.DataFrame) –
- DataFrame containing the following columns:
frequency
: number of repeat purchasesrecency
: time between the first and the last purchaseT
: time between the first purchase and the end of the observation period.Model assumptions require T >= recency
customer_id
: unique customer identifier
Along with optional covariate columns.
model_config (dict, optional) –
- Dictionary containing model parameters and covariate column names:
r_prior
: Shape parameter of time between purchases; defaults toWeibull(alpha=2, beta=1)
alpha_prior
: Scale parameter of time between purchases; defaults toWeibull(alpha=2, beta=10)
s_prior
: Shape parameter of time until dropout; defaults toWeibull(alpha=2, beta=1)
beta_prior
: Scale parameter of time until dropout; defaults toWeibull(alpha=2, beta=10)
purchase_covariates_prior
: Coefficients for purchase rate covariates; defaults toNormal(0, 3)
dropout_covariates_prior
: Coefficients for dropout covariates; defaults toNormal.dist(0, 3)
purchase_covariate_cols
: List containing column names of covariates for customer purchase rates.dropout_covariate_cols:
: List containing column names of covariates for customer dropouts.
If not provided, the model will use default priors specified in the
default_model_config
class attribute.sampler_config (dict, optional) – Dictionary of sampler parameters. Defaults to None.
Examples
import pymc as pm from pymc_marketing.clv import ParetoNBDModel, rfm_summary rfm_df = rfm_summary(raw_data,'id_col_name','date_col_name') # Initialize model with customer data; `model_config` parameter is optional model = ParetoNBDModel( data=rfm_df, model_config={ "r_prior": {"dist": "Weibull", "kwargs": {"alpha": 2, "beta": 1}}, "alpha_prior": {"dist": "Weibull", "kwargs": {"alpha": 2, "beta": 10}}, "s_prior": {"dist": "Weibull", "kwargs": {"alpha": 2, "beta": 1}}, "beta_prior": {"dist": "Weibull", "kwargs": {"alpha": 2, "beta": 10}}, }, ) # Fit model quickly to large datasets via the default Maximum a Posteriori method model.fit(fit_method='map') print(model.fit_summary()) # Use 'mcmc' for more informative predictions and reliable performance on smaller datasets model.fit(fit_method='mcmc') print(model.fit_summary()) # Predict number of purchases for customers over the next 10 time periods expected_purchases = model.expected_purchases( data=rfm_df, future_t=10, ) # Predict probability of customer making 'n' purchases over 't' time periods # Data parameter is omitted here because predictions are ran on original dataset expected_num_purchases = model.expected_purchase_probability( n=[0, 1, 2, 3], future_t=[10,20,30,40], ) new_data = pd.DataFrame( data = { "customer_id": [0, 1, 2, 3], "frequency": [5, 2, 1, 8], "recency": [7, 4, 2.5, 11], "T": [10, 8, 10, 22] } ) # Predict probability customers will still be active in 'future_t' time periods probability_alive = model.expected_probability_alive( data=new_data, future_t=[0, 3, 6, 9], ) # Predict number of purchases for a new customer over 't' time periods. expected_purchases_new_customer = model.expected_purchases_new_customer( t=[2, 5, 7, 10], )
References
Methods
ParetoNBDModel.__init__
(data, *[, ...])Initializes model configuration and sampler configuration for the model
Creates an instance of pm.Model based on provided data and model_config, and attaches it to self.
Utility function for posterior predictive sampling of dropout, purchase rate and frequency/recency of new customers.
Sample from the Gamma distribution representing dropout times for new customers.
ParetoNBDModel.distribution_new_customer_purchase_rate
([...])Sample from the Gamma distribution representing purchase rates for new customers.
ParetoNBDModel.distribution_new_customer_recency_frequency
([...])Pareto/NBD process representing purchases across the customer population.
Compute the probability that a customer with history frequency, recency, and T is currently active.
Estimate probability of n_purchases over future_t time periods, given an individual customer's current frequency, recency, and T.
ParetoNBDModel.expected_purchases
([data, ...])Given recency, frequency, and T for an individual customer, this method predicts the expected number of future purchases across future_t time periods.
Expected number of purchases for a new customer across t time periods.
ParetoNBDModel.fit
([fit_method])Infer posteriors of model parameters to run predictions.
ParetoNBDModel.fit_summary
(**kwargs)ParetoNBDModel.get_params
([deep])Get all the model parameters needed to instantiate a copy of the model, not including training data.
ParetoNBDModel.load
(fname)Creates a ModelBuilder instance from a file, Loads inference data for the model.
ParetoNBDModel.predict
(X_pred[, extend_idata])Uses model to predict on unseen data and return point prediction of all the samples.
ParetoNBDModel.predict_posterior
(X_pred[, ...])Generate posterior predictive samples on unseen data.
ParetoNBDModel.predict_proba
(X_pred[, ...])Alias for
predict_posterior
, for consistency with scikit-learn probabilistic estimators.Sample from the model's posterior predictive distribution.
Sample from the model's prior predictive distribution.
ParetoNBDModel.save
(fname)Save the model's inference data to a file.
ParetoNBDModel.set_idata_attrs
([idata])Set attributes on an InferenceData object.
ParetoNBDModel.set_params
(**params)Set all the model parameters needed to instantiate the model, not including training data.
ParetoNBDModel.thin_fit_result
(keep_every)Return a copy of the model with a thinned fit result.
Attributes
X
default_model_config
Returns a class default config dict for model builder if no model_config is provided on class initialization Useful for understanding structure of required model_config to allow its customization by users .
default_sampler_config
Returns a class default sampler dict for model builder if no sampler_config is provided on class initialization Useful for understanding structure of required sampler_config to allow its customization by users .
fit_result
id
Generate a unique hash value for the model.
output_var
Returns the name of the output variable of the model.
version
y