ParetoNBDModel#

class pymc_marketing.clv.models.pareto_nbd.ParetoNBDModel(data, *, model_config=None, sampler_config=None)[source]#

Pareto Negative Binomial Model (Pareto/NBD).

Model for continuous, non-contractual customers, first introduced by Schmittlein, et al. [1], with additional derivations and predictive methods by Hardie & Fader [2] [3] [4].

The Pareto/NBD model assumes the time duration a customer is active follows a Gamma distribution, and time between purchases is also Gamma-distributed while the customer is still active.

This model requires data to be summarized by recency, frequency, and T for each customer, using clv.rfm_summary() or equivalent. Covariates impacting customer dropouts and transaction rates are optional.

Parameters:
  • data (pd.DataFrame) –

    DataFrame containing the following columns:
    • frequency: number of repeat purchases

    • recency: time between the first and the last purchase

    • T: time between the first purchase and the end of the observation period.

      Model assumptions require T >= recency

    • customer_id: unique customer identifier

    Along with optional covariate columns.

  • model_config (dict, optional) –

    Dictionary containing model parameters and covariate column names:
    • r_prior: Shape parameter of time between purchases; defaults to Weibull(alpha=2, beta=1)

    • alpha_prior: Scale parameter of time between purchases; defaults to Weibull(alpha=2, beta=10)

    • s_prior: Shape parameter of time until dropout; defaults to Weibull(alpha=2, beta=1)

    • beta_prior: Scale parameter of time until dropout; defaults to Weibull(alpha=2, beta=10)

    • purchase_covariates_prior: Coefficients for purchase rate covariates; defaults to Normal(0, 3)

    • dropout_covariates_prior: Coefficients for dropout covariates; defaults to Normal.dist(0, 3)

    • purchase_covariate_cols: List containing column names of covariates for customer purchase rates.

    • dropout_covariate_cols:: List containing column names of covariates for customer dropouts.

    If not provided, the model will use default priors specified in the default_model_config class attribute.

  • sampler_config (dict, optional) – Dictionary of sampler parameters. Defaults to None.

Examples

import pymc as pm
from pymc_marketing.clv import ParetoNBDModel, rfm_summary

rfm_df = rfm_summary(raw_data,'id_col_name','date_col_name')

# Initialize model with customer data; `model_config` parameter is optional
model = ParetoNBDModel(
    data=rfm_df,
    model_config={
        "r_prior": {"dist": "Weibull", "kwargs": {"alpha": 2, "beta": 1}},
        "alpha_prior": {"dist": "Weibull", "kwargs": {"alpha": 2, "beta": 10}},
        "s_prior": {"dist": "Weibull", "kwargs": {"alpha": 2, "beta": 1}},
        "beta_prior": {"dist": "Weibull", "kwargs": {"alpha": 2, "beta": 10}},
    },
)

# Fit model quickly to large datasets via the default Maximum a Posteriori method
model.fit(fit_method='map')
print(model.fit_summary())

# Use 'mcmc' for more informative predictions and reliable performance on smaller datasets
model.fit(fit_method='mcmc')
print(model.fit_summary())

# Predict number of purchases for customers over the next 10 time periods
expected_purchases = model.expected_purchases(
    data=rfm_df,
    future_t=10,
)

# Predict probability of customer making 'n' purchases over 't' time periods
# Data parameter is omitted here because predictions are ran on original dataset
expected_num_purchases = model.expected_purchase_probability(
    n=[0, 1, 2, 3],
    future_t=[10,20,30,40],
)

new_data = pd.DataFrame(
    data = {
    "customer_id": [0, 1, 2, 3],
    "frequency": [5, 2, 1, 8],
    "recency": [7, 4, 2.5, 11],
    "T": [10, 8, 10, 22]
    }
)

# Predict probability customers will still be active in 'future_t' time periods
probability_alive = model.expected_probability_alive(
    data=new_data,
    future_t=[0, 3, 6, 9],
)

# Predict number of purchases for a new customer over 't' time periods.
expected_purchases_new_customer = model.expected_purchases_new_customer(
    t=[2, 5, 7, 10],
)

References

Methods

ParetoNBDModel.__init__(data, *[, ...])

Initializes model configuration and sampler configuration for the model

ParetoNBDModel.build_model()

Creates an instance of pm.Model based on provided data and model_config, and attaches it to self.

ParetoNBDModel.distribution_new_customer([...])

Utility function for posterior predictive sampling of dropout, purchase rate and frequency/recency of new customers.

ParetoNBDModel.distribution_new_customer_dropout([...])

Sample from the Gamma distribution representing dropout times for new customers.

ParetoNBDModel.distribution_new_customer_purchase_rate([...])

Sample from the Gamma distribution representing purchase rates for new customers.

ParetoNBDModel.distribution_new_customer_recency_frequency([...])

Pareto/NBD process representing purchases across the customer population.

ParetoNBDModel.expected_probability_alive([...])

Compute the probability that a customer with history frequency, recency, and T is currently active.

ParetoNBDModel.expected_purchase_probability([...])

Estimate probability of n_purchases over future_t time periods, given an individual customer's current frequency, recency, and T.

ParetoNBDModel.expected_purchases([data, ...])

Given recency, frequency, and T for an individual customer, this method predicts the expected number of future purchases across future_t time periods.

ParetoNBDModel.expected_purchases_new_customer([...])

Expected number of purchases for a new customer across t time periods.

ParetoNBDModel.fit([fit_method])

Infer posteriors of model parameters to run predictions.

ParetoNBDModel.fit_summary(**kwargs)

ParetoNBDModel.get_params([deep])

Get all the model parameters needed to instantiate a copy of the model, not including training data.

ParetoNBDModel.load(fname)

Creates a ModelBuilder instance from a file, Loads inference data for the model.

ParetoNBDModel.predict(X_pred[, extend_idata])

Uses model to predict on unseen data and return point prediction of all the samples.

ParetoNBDModel.predict_posterior(X_pred[, ...])

Generate posterior predictive samples on unseen data.

ParetoNBDModel.predict_proba(X_pred[, ...])

Alias for predict_posterior, for consistency with scikit-learn probabilistic estimators.

ParetoNBDModel.sample_posterior_predictive(X_pred)

Sample from the model's posterior predictive distribution.

ParetoNBDModel.sample_prior_predictive(X_pred)

Sample from the model's prior predictive distribution.

ParetoNBDModel.save(fname)

Save the model's inference data to a file.

ParetoNBDModel.set_idata_attrs([idata])

Set attributes on an InferenceData object.

ParetoNBDModel.set_params(**params)

Set all the model parameters needed to instantiate the model, not including training data.

ParetoNBDModel.thin_fit_result(keep_every)

Return a copy of the model with a thinned fit result.

Attributes

X

default_model_config

Returns a class default config dict for model builder if no model_config is provided on class initialization Useful for understanding structure of required model_config to allow its customization by users .

default_sampler_config

Returns a class default sampler dict for model builder if no sampler_config is provided on class initialization Useful for understanding structure of required sampler_config to allow its customization by users .

fit_result

id

Generate a unique hash value for the model.

output_var

Returns the name of the output variable of the model.

version

y