BetaGeoModel#

class pymc_marketing.clv.models.beta_geo.BetaGeoModel(data, model_config=None, sampler_config=None)[source]#

Beta-Geo Negative Binomial Distribution (BG/NBD) model

In the BG/NBD model, the frequency of customer purchases is modelled as the time of each purchase has an instantaneous probability of occurrence (hazard) and, at every purchase, a probability of “dropout”, i.e. no longer being a customer.

Customer-specific data needed for statistical inference include 1) the total number of purchases (\(x\)) and 2) the time of the last, i.e. xth, purchase. The omission of purchase times \(t_1, ..., t_x\) is due to a telescoping sum in the exponential function of the joint likelihood; see Section 4.1 of [1] for more details.

Methods below are adapted from the BetaGeoFitter class from the lifetimes package (see CamDavidsonPilon/lifetimes).

Parameters:
  • data (pd.DataFrame) –

    DataFrame containing the following columns:
    • frequency: number of repeat purchases (with possible values 0, 1, 2, …)

    • recency: time between the first and the last purchase (with possible values 0, 1, 2, …)

    • T: time between the first purchase and the end of the observation

      period (with possible values 0, 1, 2, …)

    • customer_id: unique customer identifier

  • model_config (dict, optional) – Dictionary of model prior parameters. If not provided, the model will use default priors specified in the default_model_config class attribute.

  • sampler_config (dict, optional) – Dictionary of sampler parameters. Defaults to None.

Examples

BG/NBD model for customer

import pandas as pd

import pymc as pm
from pymc_marketing.clv import BetaGeoModel

data = pd.DataFrame({
    "frequency": [4, 0, 6, 3],
    "recency": [30.73, 1.72, 0., 0.],
    "T": [38.86, 38.86, 38.86, 38.86],
})
data["customer_id"] = data.index

prior_distribution = {"dist": "Gamma", "kwargs": {"alpha": 0.1, "beta": 0.1}}
model = BetaGeoModel(
    data=data,
    model_config={
        "r_prior": prior_distribution,
        "alpha_prior": prior_distribution,
        "a_prior": prior_distribution,
        "b_prior": prior_distribution,
    },
    sampler_config={
        "draws": 1000,
        "tune": 1000,
        "chains": 2,
        "cores": 2,
    },
)
model.build_model()
model.fit()
print(model.fit_summary())

# Estimating the expected number of purchases for a randomly chosen
# individual in a future time period of length t
expected_num_purchases = model.expected_num_purchases(
    t=[2, 5, 7, 10],
)

# Predicting the customer-specific number of purchases for a future
# time interval of length t given their previous frequency and recency
expected_num_purchases_new_customer = model.expected_num_purchases_new_customer(
    t=[5, 5, 5, 5],
    frequency=[5, 2, 1, 8],
    recency=[7, 4, 2.5, 11],
    T=[10, 8, 10, 22],
)

References

Methods

BetaGeoModel.__init__(data[, model_config, ...])

Initializes model configuration and sampler configuration for the model

BetaGeoModel.build_model()

Creates an instance of pm.Model based on provided data and model_config, and attaches it to self.

BetaGeoModel.distribution_new_customer_dropout([...])

Sample the Beta distribution for the population-level dropout rate.

BetaGeoModel.distribution_new_customer_purchase_rate([...])

Sample the Gamma distribution for the population-level purchase rate.

BetaGeoModel.expected_num_purchases(...)

Given a purchase history/profile of \(x\) and \(t_x\) for an individual customer, this method returns the expected number of future purchases in the next time interval of length \(t\), i.e. \((T, T + t]\).

BetaGeoModel.expected_num_purchases_new_customer(t)

Posterior expected number of purchases for any interval of length \(t\).

BetaGeoModel.expected_probability_alive(...)

Posterior expected value of the probability of being alive at time T.

BetaGeoModel.fit([fit_method])

Infer model posterior

BetaGeoModel.fit_summary(**kwargs)

BetaGeoModel.get_params([deep])

Get all the model parameters needed to instantiate a copy of the model, not including training data.

BetaGeoModel.load(fname)

Creates a ModelBuilder instance from a file, Loads inference data for the model.

BetaGeoModel.predict(X_pred[, extend_idata])

Uses model to predict on unseen data and return point prediction of all the samples.

BetaGeoModel.predict_posterior(X_pred[, ...])

Generate posterior predictive samples on unseen data.

BetaGeoModel.predict_proba(X_pred[, ...])

Alias for predict_posterior, for consistency with scikit-learn probabilistic estimators.

BetaGeoModel.sample_posterior_predictive(X_pred)

Sample from the model's posterior predictive distribution.

BetaGeoModel.sample_prior_predictive(X_pred)

Sample from the model's prior predictive distribution.

BetaGeoModel.save(fname)

Save the model's inference data to a file.

BetaGeoModel.set_idata_attrs([idata])

Set attributes on an InferenceData object.

BetaGeoModel.set_params(**params)

Set all the model parameters needed to instantiate the model, not including training data.

BetaGeoModel.thin_fit_result(keep_every)

Return a copy of the model with a thinned fit result.

Attributes

X

default_model_config

Returns a class default config dict for model builder if no model_config is provided on class initialization Useful for understanding structure of required model_config to allow its customization by users .

default_sampler_config

Returns a class default sampler dict for model builder if no sampler_config is provided on class initialization Useful for understanding structure of required sampler_config to allow its customization by users .

fit_result

id

Generate a unique hash value for the model.

output_var

Returns the name of the output variable of the model.

version

y