ParetoNBDModel#

class pymc_marketing.clv.models.pareto_nbd.ParetoNBDModel(data, *, model_config=None, sampler_config=None)[source]#

Pareto Negative Binomial Model (Pareto/NBD).

Model for continuous, non-contractual customers, first introduced by Schmittlein et al. [1], with additional derivations and predictive methods by Hardie & Fader [2] [3] [4] [5].

The Pareto/NBD model assumes the time duration a customer is active follows a Gamma distribution, and time between purchases is also Gamma-distributed while the customer is still active.

This model requires data to be summarized by recency, frequency, and T for each customer, using clv.rfm_summary() or equivalent. Covariates impacting customer dropouts and transaction rates are optional.

Parameters:
dataDataFrame

DataFrame containing the following columns:

  • customer_id: Unique customer identifier

  • frequency: Number of repeat purchases

  • recency: Time between the first and the last purchase

  • T: Time between the first purchase and the end of the observation period. Model assumptions require T >= recency

Along with optional covariate columns.

model_configdict, optional

Dictionary containing model parameters and covariate column names:

  • r_prior: Shape parameter of time between purchases; defaults to Weibull(alpha=2, beta=1)

  • alpha_prior: Scale parameter of time between purchases; defaults to Weibull(alpha=2, beta=10)

  • s_prior: Shape parameter of time until dropout; defaults to Weibull(alpha=2, beta=1)

  • beta_prior: Scale parameter of time until dropout; defaults to Weibull(alpha=2, beta=10)

  • purchase_covariates_prior: Coefficients for purchase rate covariates; defaults to Normal(0, 3)

  • dropout_covariates_prior: Coefficients for dropout covariates; defaults to Normal.dist(0, 3)

  • purchase_covariate_cols: List containing column names of covariates for customer purchase rates.

  • dropout_covariate_cols: List containing column names of covariates for customer dropouts.

If not provided, the model will use default priors specified in the default_model_config class attribute.

sampler_configdict, optional

Dictionary of sampler parameters. Defaults to None.

References

[1]

David C. Schmittlein, Donald G. Morrison and Richard Colombo. “Counting Your Customers: Who Are They and What Will They Do Next”. Management Science,Vol. 33, No. 1 (Jan., 1987), pp. 1-24.

[2]

Fader, Peter & G. S. Hardie, Bruce (2005). “A Note on Deriving the Pareto/NBD Model and Related Expressions”. http://brucehardie.com/notes/009/pareto_nbd_derivations_2005-11-05.pdf

[3]

Fader, Peter & G. S. Hardie, Bruce (2014). “Additional Results for the Pareto/NBD Model”. https://www.brucehardie.com/notes/015/additional_pareto_nbd_results.pdf

[4]

Fader, Peter & G. S. Hardie, Bruce (2014). “Deriving the Conditional PMF of the Pareto/NBD Model”. https://www.brucehardie.com/notes/028/pareto_nbd_conditional_pmf.pdf

[5]

Fader, Peter & G. S. Hardie, Bruce (2007). “Incorporating Time-Invariant Covariates into the Pareto/NBD and BG/NBD Models”. https://www.brucehardie.com/notes/019/time_invariant_covariates.pdf

Examples

import pymc as pm

from pymc_marketing.prior import Prior
from pymc_marketing.clv import ParetoNBDModel, rfm_summary

rfm_df = rfm_summary(raw_data,'id_col_name','date_col_name')

# Initialize model with customer data; `model_config` parameter is optional
model = ParetoNBDModel(
    data=rfm_df,
    model_config={
        "r_prior": Prior("Weibull", alpha=2, beta=1),
        "alpha_prior: Prior("Weibull", alpha=2, beta=10),
        "s_prior": Prior("Weibull", alpha=2, beta=1),
        "beta_prior": Prior("Weibull", alpha=2, beta=10),
    },
)

# Fit model quickly to large datasets via the default Maximum a Posteriori method
model.fit(fit_method='map')
print(model.fit_summary())

# Use 'demz' for more informative predictions and reliable performance on smaller datasets
model.fit(fit_method='demz')
print(model.fit_summary())

# Predict number of purchases for customers over the next 10 time periods
expected_purchases = model.expected_purchases(
    data=rfm_df,
    future_t=10,
)

# Predict probability of customer making 'n' purchases over 't' time periods
# Data parameter is omitted here because predictions are ran on original dataset
expected_num_purchases = model.expected_purchase_probability(
    n=[0, 1, 2, 3],
    future_t=[10,20,30,40],
)

new_data = pd.DataFrame(
    data = {
    "customer_id": [0, 1, 2, 3],
    "frequency": [5, 2, 1, 8],
    "recency": [7, 4, 2.5, 11],
    "T": [10, 8, 10, 22]
    }
)

# Predict probability customers will still be active in 'future_t' time periods
probability_alive = model.expected_probability_alive(
    data=new_data,
    future_t=[0, 3, 6, 9],
)

# Predict number of purchases for a new customer over 't' time periods.
expected_purchases_new_customer = model.expected_purchases_new_customer(
    t=[2, 5, 7, 10],
)

Methods

ParetoNBDModel.__init__(data, *[, ...])

Initialize model configuration and sampler configuration for the model.

ParetoNBDModel.attrs_to_init_kwargs(attrs)

Convert the model configuration and sampler configuration from the attributes to keyword arguments.

ParetoNBDModel.build_from_idata(idata)

Build model from the InferenceData object.

ParetoNBDModel.build_model()

Build the model.

ParetoNBDModel.create_fit_data(X, y)

Create the fit_data group based on the input data.

ParetoNBDModel.create_idata_attrs()

Create attributes for the inference data.

ParetoNBDModel.distribution_new_customer([...])

Compute posterior predictive samples of dropout, purchase rate and frequency/recency of new customers.

ParetoNBDModel.distribution_new_customer_dropout([...])

Sample from the Gamma distribution representing dropout times for new customers.

ParetoNBDModel.distribution_new_customer_purchase_rate([...])

Sample from the Gamma distribution representing purchase rates for new customers.

ParetoNBDModel.distribution_new_customer_recency_frequency([...])

Pareto/NBD process representing purchases across the customer population.

ParetoNBDModel.expected_probability_alive([...])

Compute expected probability of being alive.

ParetoNBDModel.expected_purchase_probability([...])

Compute expected probability of n_purchases over future_t time periods.

ParetoNBDModel.expected_purchases([data, ...])

Compute expected number of future purchases.

ParetoNBDModel.expected_purchases_new_customer([...])

Compute the expected number of purchases for a new customer across t time periods.

ParetoNBDModel.fit([fit_method])

Infer posteriors of model parameters to run predictions.

ParetoNBDModel.fit_summary(**kwargs)

Compute the summary of the fit result.

ParetoNBDModel.graphviz(**kwargs)

Get the graphviz representation of the model.

ParetoNBDModel.load(fname)

Create a ModelBuilder instance from a file.

ParetoNBDModel.load_from_idata(idata)

Create a ModelBuilder instance from an InferenceData object.

ParetoNBDModel.predict([X, extend_idata])

Use a model to predict on unseen data and return point prediction of all the samples.

ParetoNBDModel.predict_posterior([X, ...])

Generate posterior predictive samples on unseen data.

ParetoNBDModel.predict_proba([X, ...])

Alias for predict_posterior, for consistency with scikit-learn probabilistic estimators.

ParetoNBDModel.sample_posterior_predictive([...])

Sample from the model's posterior predictive distribution.

ParetoNBDModel.sample_prior_predictive([X, ...])

Sample from the model's prior predictive distribution.

ParetoNBDModel.save(fname)

Save the model's inference data to a file.

ParetoNBDModel.set_idata_attrs([idata])

Set attributes on an InferenceData object.

ParetoNBDModel.thin_fit_result(keep_every)

Return a copy of the model with a thinned fit result.

Attributes

X

default_model_config

Default model configuration.

default_sampler_config

Default sampler configuration.

fit_result

Get the posterior fit_result.

id

Generate a unique hash value for the model.

output_var

Output variable of the model.

posterior

posterior_predictive

predictions

prior

prior_predictive

version

y