BetaGeoModel#

class pymc_marketing.clv.models.beta_geo.BetaGeoModel(data, model_config=None, sampler_config=None)[source]#

Beta-Geo Negative Binomial Distribution (BG/NBD) model

In the BG/NBD model, the frequency of customer purchases is modelled as the time of each purchase has an instantaneous probability of occurrence (hazard) and, at every purchase, a probability of “dropout”, i.e. no longer being a customer.

Customer-specific data needed for statistical inference include 1) the total number of purchases (\(x\)) and 2) the time of the last, i.e. xth, purchase. The omission of purchase times \(t_1, ..., t_x\) is due to a telescoping sum in the exponential function of the joint likelihood; see Section 4.1 of [1] for more details.

Methods below are adapted from the BetaGeoFitter class from the lifetimes package (see CamDavidsonPilon/lifetimes).

Parameters:

data (pd.DataFrame) –
DataFrame containing the following columns:
- frequency: number of repeat purchases (with possible values 0, 1, 2, …)
- recency: time between the first and the last purchase (with possible values 0, 1, 2, …)
- T: time between the first purchase and the end of the observation
  period (with possible values 0, 1, 2, …)
- customer_id: unique customer identifier
model_config (dict, optional) – Dictionary of model prior parameters. If not provided, the model will use default priors specified in the default_model_config class attribute.
sampler_config (dict, optional) – Dictionary of sampler parameters. Defaults to None.

Examples

BG/NBD model for customer

import pandas as pd

import pymc as pm
from pymc_marketing.clv import BetaGeoModel

data = pd.DataFrame({
    "frequency": [4, 0, 6, 3],
    "recency": [30.73, 1.72, 0., 0.],
    "T": [38.86, 38.86, 38.86, 38.86],
})
data["customer_id"] = data.index

prior_distribution = {"dist": "Gamma", "kwargs": {"alpha": 0.1, "beta": 0.1}}
model = BetaGeoModel(
    data=data,
    model_config={
        "r_prior": prior_distribution,
        "alpha_prior": prior_distribution,
        "a_prior": prior_distribution,
        "b_prior": prior_distribution,
    },
    sampler_config={
        "draws": 1000,
        "tune": 1000,
        "chains": 2,
        "cores": 2,
    },
)
model.build_model()
model.fit()
print(model.fit_summary())

# Estimating the expected number of purchases for a randomly chosen
# individual in a future time period of length t
expected_num_purchases = model.expected_num_purchases(
    t=[2, 5, 7, 10],
)

# Predicting the customer-specific number of purchases for a future
# time interval of length t given their previous frequency and recency
expected_num_purchases_new_customer = model.expected_num_purchases_new_customer(
    t=[5, 5, 5, 5],
    frequency=[5, 2, 1, 8],
    recency=[7, 4, 2.5, 11],
    T=[10, 8, 10, 22],
)

References

Methods

`BetaGeoModel.__init__`(data[, model_config, ...])	Initializes model configuration and sampler configuration for the model
`BetaGeoModel.build_model`()	Creates an instance of pm.Model based on provided data and model_config, and attaches it to self.
`BetaGeoModel.distribution_new_customer_dropout`([...])	Sample the Beta distribution for the population-level dropout rate.
`BetaGeoModel.distribution_new_customer_purchase_rate`([...])	Sample the Gamma distribution for the population-level purchase rate.
`BetaGeoModel.expected_num_purchases`(...)	Given a purchase history/profile of \(x\) and \(t_x\) for an individual customer, this method returns the expected number of future purchases in the next time interval of length \(t\), i.e. \((T, T + t]\).
`BetaGeoModel.expected_num_purchases_new_customer`(t)	Posterior expected number of purchases for any interval of length \(t\).
`BetaGeoModel.expected_probability_alive`(...)	Posterior expected value of the probability of being alive at time T.
`BetaGeoModel.fit`([fit_method])	Infer model posterior
`BetaGeoModel.fit_summary`(**kwargs)
`BetaGeoModel.get_params`([deep])	Get all the model parameters needed to instantiate a copy of the model, not including training data.
`BetaGeoModel.load`(fname)	Creates a ModelBuilder instance from a file, Loads inference data for the model.
`BetaGeoModel.predict`(X_pred[, extend_idata])	Uses model to predict on unseen data and return point prediction of all the samples.
`BetaGeoModel.predict_posterior`(X_pred[, ...])	Generate posterior predictive samples on unseen data.
`BetaGeoModel.predict_proba`(X_pred[, ...])	Alias for `predict_posterior`, for consistency with scikit-learn probabilistic estimators.
`BetaGeoModel.sample_posterior_predictive`(X_pred)	Sample from the model's posterior predictive distribution.
`BetaGeoModel.sample_prior_predictive`(X_pred)	Sample from the model's prior predictive distribution.
`BetaGeoModel.save`(fname)	Save the model's inference data to a file.
`BetaGeoModel.set_idata_attrs`([idata])	Set attributes on an InferenceData object.
`BetaGeoModel.set_params`(**params)	Set all the model parameters needed to instantiate the model, not including training data.
`BetaGeoModel.thin_fit_result`(keep_every)	Return a copy of the model with a thinned fit result.

Attributes

`X`
`default_model_config`	Returns a class default config dict for model builder if no model_config is provided on class initialization Useful for understanding structure of required model_config to allow its customization by users .
`default_sampler_config`	Returns a class default sampler dict for model builder if no sampler_config is provided on class initialization Useful for understanding structure of required sampler_config to allow its customization by users .
`fit_result`
`id`	Generate a unique hash value for the model.
`output_var`	Returns the name of the output variable of the model.
`version`
`y`