ParetoNBDModel#

class pymc_marketing.clv.models.pareto_nbd.ParetoNBDModel(data, *, model_config=None, sampler_config=None)[source]#

Pareto Negative Binomial Model (Pareto/NBD).

Model for continuous, non-contractual customers, first introduced by Schmittlein, et al. [1], with additional derivations and predictive methods by Hardie & Fader [2] [3] [4].

The Pareto/NBD model assumes the time duration a customer is active follows a Gamma distribution, and time between purchases is also Gamma-distributed while the customer is still active.

This model requires data to be summarized by recency, frequency, and T for each customer, using clv.rfm_summary() or equivalent. Covariates impacting customer dropouts and transaction rates are optional.

Parameters:

data (pd.DataFrame) –
DataFrame containing the following columns:
- frequency: number of repeat purchases
- recency: time between the first and the last purchase
- T: time between the first purchase and the end of the observation period.
  Model assumptions require T >= recency
- customer_id: unique customer identifier
Along with optional covariate columns.
model_config (dict, optional) –
Dictionary containing model parameters and covariate column names:
- r_prior: Shape parameter of time between purchases; defaults to Weibull(alpha=2, beta=1)
- alpha_prior: Scale parameter of time between purchases; defaults to Weibull(alpha=2, beta=10)
- s_prior: Shape parameter of time until dropout; defaults to Weibull(alpha=2, beta=1)
- beta_prior: Scale parameter of time until dropout; defaults to Weibull(alpha=2, beta=10)
- purchase_covariates_prior: Coefficients for purchase rate covariates; defaults to Normal(0, 3)
- dropout_covariates_prior: Coefficients for dropout covariates; defaults to Normal.dist(0, 3)
- purchase_covariate_cols: List containing column names of covariates for customer purchase rates.
- dropout_covariate_cols:: List containing column names of covariates for customer dropouts.
If not provided, the model will use default priors specified in the default_model_config class attribute.
sampler_config (dict, optional) – Dictionary of sampler parameters. Defaults to None.

Examples

import pymc as pm
from pymc_marketing.clv import ParetoNBDModel, rfm_summary

rfm_df = rfm_summary(raw_data,'id_col_name','date_col_name')

# Initialize model with customer data; `model_config` parameter is optional
model = ParetoNBDModel(
    data=rfm_df,
    model_config={
        "r_prior": {"dist": "Weibull", "kwargs": {"alpha": 2, "beta": 1}},
        "alpha_prior": {"dist": "Weibull", "kwargs": {"alpha": 2, "beta": 10}},
        "s_prior": {"dist": "Weibull", "kwargs": {"alpha": 2, "beta": 1}},
        "beta_prior": {"dist": "Weibull", "kwargs": {"alpha": 2, "beta": 10}},
    },
)

# Fit model quickly to large datasets via the default Maximum a Posteriori method
model.fit(fit_method='map')
print(model.fit_summary())

# Use 'mcmc' for more informative predictions and reliable performance on smaller datasets
model.fit(fit_method='mcmc')
print(model.fit_summary())

# Predict number of purchases for customers over the next 10 time periods
expected_purchases = model.expected_purchases(
    data=rfm_df,
    future_t=10,
)

# Predict probability of customer making 'n' purchases over 't' time periods
# Data parameter is omitted here because predictions are ran on original dataset
expected_num_purchases = model.expected_purchase_probability(
    n=[0, 1, 2, 3],
    future_t=[10,20,30,40],
)

new_data = pd.DataFrame(
    data = {
    "customer_id": [0, 1, 2, 3],
    "frequency": [5, 2, 1, 8],
    "recency": [7, 4, 2.5, 11],
    "T": [10, 8, 10, 22]
    }
)

# Predict probability customers will still be active in 'future_t' time periods
probability_alive = model.expected_probability_alive(
    data=new_data,
    future_t=[0, 3, 6, 9],
)

# Predict number of purchases for a new customer over 't' time periods.
expected_purchases_new_customer = model.expected_purchases_new_customer(
    t=[2, 5, 7, 10],
)

References

Methods

`ParetoNBDModel.__init__`(data, *[, ...])	Initializes model configuration and sampler configuration for the model
`ParetoNBDModel.build_model`()	Creates an instance of pm.Model based on provided data and model_config, and attaches it to self.
`ParetoNBDModel.distribution_new_customer`([...])	Utility function for posterior predictive sampling of dropout, purchase rate and frequency/recency of new customers.
`ParetoNBDModel.distribution_new_customer_dropout`([...])	Sample from the Gamma distribution representing dropout times for new customers.
`ParetoNBDModel.distribution_new_customer_purchase_rate`([...])	Sample from the Gamma distribution representing purchase rates for new customers.
`ParetoNBDModel.distribution_new_customer_recency_frequency`([...])	Pareto/NBD process representing purchases across the customer population.
`ParetoNBDModel.expected_probability_alive`([...])	Compute the probability that a customer with history frequency, recency, and T is currently active.
`ParetoNBDModel.expected_purchase_probability`([...])	Estimate probability of n_purchases over future_t time periods, given an individual customer's current frequency, recency, and T.
`ParetoNBDModel.expected_purchases`([data, ...])	Given recency, frequency, and T for an individual customer, this method predicts the expected number of future purchases across future_t time periods.
`ParetoNBDModel.expected_purchases_new_customer`([...])	Expected number of purchases for a new customer across t time periods.
`ParetoNBDModel.fit`([fit_method])	Infer posteriors of model parameters to run predictions.
`ParetoNBDModel.fit_summary`(**kwargs)
`ParetoNBDModel.get_params`([deep])	Get all the model parameters needed to instantiate a copy of the model, not including training data.
`ParetoNBDModel.load`(fname)	Creates a ModelBuilder instance from a file, Loads inference data for the model.
`ParetoNBDModel.predict`(X_pred[, extend_idata])	Uses model to predict on unseen data and return point prediction of all the samples.
`ParetoNBDModel.predict_posterior`(X_pred[, ...])	Generate posterior predictive samples on unseen data.
`ParetoNBDModel.predict_proba`(X_pred[, ...])	Alias for `predict_posterior`, for consistency with scikit-learn probabilistic estimators.
`ParetoNBDModel.sample_posterior_predictive`(X_pred)	Sample from the model's posterior predictive distribution.
`ParetoNBDModel.sample_prior_predictive`(X_pred)	Sample from the model's prior predictive distribution.
`ParetoNBDModel.save`(fname)	Save the model's inference data to a file.
`ParetoNBDModel.set_idata_attrs`([idata])	Set attributes on an InferenceData object.
`ParetoNBDModel.set_params`(**params)	Set all the model parameters needed to instantiate the model, not including training data.
`ParetoNBDModel.thin_fit_result`(keep_every)	Return a copy of the model with a thinned fit result.

Attributes

`X`
`default_model_config`	Returns a class default config dict for model builder if no model_config is provided on class initialization Useful for understanding structure of required model_config to allow its customization by users .
`default_sampler_config`	Returns a class default sampler dict for model builder if no sampler_config is provided on class initialization Useful for understanding structure of required sampler_config to allow its customization by users .
`fit_result`
`id`	Generate a unique hash value for the model.
`output_var`	Returns the name of the output variable of the model.
`version`
`y`