MBG/NBD Model#

In this notebook we show how to fit a MBG/NBD model in PyMC-Marketing. We compare the results with the lifetimes package (no longer maintained). The model is presented in the paper: Batislam E. P., Denizel M., Filiztekin A. (2007) Empirical validation and comparison of models for customer base analysis

Prepare Notebook#

import arviz as az
import matplotlib.pyplot as plt
import pandas as pd
import xarray as xr
from fastprogress.fastprogress import progress_bar
from lifetimes import ModifiedBetaGeoFitter

from pymc_marketing import clv

# Plotting configuration
az.style.use("arviz-darkgrid")
plt.rcParams["figure.figsize"] = [12, 7]
plt.rcParams["figure.dpi"] = 100
plt.rcParams["figure.facecolor"] = "white"

%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = "retina"

Read Data#

We use the CDNOW dataset (see lifetimes quick-start).

data_path = "https://raw.githubusercontent.com/pymc-labs/pymc-marketing/main/data/clv_quickstart.csv"

df = pd.read_csv(data_path)

df.head()
frequency recency T monetary_value
0 2 30.43 38.86 22.35
1 1 1.71 38.86 11.77
2 0 0.00 38.86 0.00
3 0 0.00 38.86 0.00
4 0 0.00 38.86 0.00

Recall from the lifetimes documentation the following definitions:

  • frequency represents the number of repeat purchases the customer has made. This means that it’s one less than the total number of purchases. This is actually slightly wrong. It’s the count of time periods the customer had a purchase in. So if using days as units, then it’s the count of days the customer had a purchase on.

  • T represents the age of the customer in whatever time units chosen (weekly, in the above dataset). This is equal to the duration between a customer’s first purchase and the end of the period under study.

  • recency represents the age of the customer when they made their most recent purchases. This is equal to the duration between a customer’s first purchase and their latest purchase. (Thus if they have made only 1 purchase, the recency is 0.)

Tip

We rename the index column to customer_id as this is required by the model

data = (
    df.reset_index()
    .rename(columns={"index": "customer_id"})
    .drop(columns="monetary_value")
)

Model Specification#

The MBG/NBD model is a probabilistic model that describes the buying behavior of a customer in the non-contractual setting. It is based on the following assumptions for each customer:

Dropout after first purchase#

Contrasting with the BG/NBD model, in the MBG/NBD a customer may drop out at time zero with probability p. This leads to the following individual level likelihood function:

\[ L(\lambda, p | X=x, T) = (1 - p)^{x+1} \lambda^x \exp(\lambda T) + p(1-p)^x \lambda^x \exp(-\lambda t_x)\]

Compare the previous expresion with the regular BG/NBD likelihood:

\[ L(\lambda, p | X=x, T) = (1 - p)^{x} \lambda^x \exp(\lambda T) + \delta_{x>0} p(1-p)^{x-1} \lambda^x \exp(-\lambda t_x)\]

Model Fitting#

Estimating such parameters is very easy in PyMC-Marketing. We instantiate the model in a similar way:

model = clv.ModifiedBetaGeoModel(data=data)

And build the model to see the model configuration:

model.build_model()
model
MBG/NBD
            alpha ~ Weibull(2, 10)
                r ~ Weibull(2, 1)
      phi_dropout ~ Uniform(0, 1)
    kappa_dropout ~ Pareto(1, 1)
                a ~ Deterministic(f(kappa_dropout, phi_dropout))
                b ~ Deterministic(f(kappa_dropout, phi_dropout))
recency_frequency ~ ModifiedBetaGeoNBD(a, b, r, alpha, <constant>)

Notice the additional phi_dropout and kappa_dropout priors. These were added to the default configuration to improve performance, but can be omitted when specifying a custom model_config with a and b.

The specified model structure can also be visualized:

model.graphviz()
../../_images/84d91abb4f65f82cc22b78b6d4e67e1161f5f633c8eb14553fd0b50a076ed064.svg

We can now fit the model. The default sampler in PyMC-Marketing is the No-U-Turn Sampler (NUTS). We use the default \(4\) chains and \(1000\) draws per chain.

Note

It is not necessary to build the model before fitting it. We can fit the model directly.

sample_kwargs = {
    "draws": 2_000,
    "chains": 4,
    "target_accept": 0.9,
    "random_seed": 42,
}

idata_mcmc = model.fit(**sample_kwargs)
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, r, phi_dropout, kappa_dropout]

Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 10 seconds.
idata_mcmc
arviz.InferenceData
    • <xarray.Dataset> Size: 400kB
      Dimensions:        (chain: 4, draw: 2000)
      Coordinates:
        * chain          (chain) int64 32B 0 1 2 3
        * draw           (draw) int64 16kB 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999
      Data variables:
          alpha          (chain, draw) float64 64kB 5.809 7.136 7.447 ... 6.752 6.388
          r              (chain, draw) float64 64kB 0.4597 0.6567 ... 0.6914 0.6354
          phi_dropout    (chain, draw) float64 64kB 0.3297 0.4224 ... 0.4218 0.4323
          kappa_dropout  (chain, draw) float64 64kB 2.99 2.086 1.879 ... 1.585 1.98
          a              (chain, draw) float64 64kB 0.9859 0.881 ... 0.6685 0.8557
          b              (chain, draw) float64 64kB 2.005 1.205 1.121 ... 0.9165 1.124
      Attributes:
          created_at:                 2025-02-05T16:31:32.193921+00:00
          arviz_version:              0.20.0
          inference_library:          pymc
          inference_library_version:  5.19.1
          sampling_time:              10.055245637893677
          tuning_steps:               1000

    • <xarray.Dataset> Size: 992kB
      Dimensions:                (chain: 4, draw: 2000)
      Coordinates:
        * chain                  (chain) int64 32B 0 1 2 3
        * draw                   (draw) int64 16kB 0 1 2 3 4 ... 1996 1997 1998 1999
      Data variables: (12/17)
          step_size              (chain, draw) float64 64kB 0.211 0.211 ... 0.2099
          tree_depth             (chain, draw) int64 64kB 4 5 3 4 5 3 ... 3 3 4 4 3 3
          smallest_eigval        (chain, draw) float64 64kB nan nan nan ... nan nan
          perf_counter_start     (chain, draw) float64 64kB 7.462e+05 ... 7.462e+05
          lp                     (chain, draw) float64 64kB -9.588e+03 ... -9.589e+03
          energy                 (chain, draw) float64 64kB 9.589e+03 ... 9.594e+03
          ...                     ...
          acceptance_rate        (chain, draw) float64 64kB 0.8855 0.837 ... 0.793
          perf_counter_diff      (chain, draw) float64 64kB 0.004076 ... 0.001634
          step_size_bar          (chain, draw) float64 64kB 0.2289 0.2289 ... 0.2192
          n_steps                (chain, draw) float64 64kB 15.0 31.0 7.0 ... 7.0 7.0
          index_in_trajectory    (chain, draw) int64 64kB 6 -11 -2 -5 13 ... 7 9 -6 -4
          reached_max_treedepth  (chain, draw) bool 8kB False False ... False False
      Attributes:
          created_at:                 2025-02-05T16:31:32.205562+00:00
          arviz_version:              0.20.0
          inference_library:          pymc
          inference_library_version:  5.19.1
          sampling_time:              10.055245637893677
          tuning_steps:               1000

    • <xarray.Dataset> Size: 57kB
      Dimensions:            (customer_id: 2357, obs_var: 2)
      Coordinates:
        * customer_id        (customer_id) int64 19kB 0 1 2 3 ... 2353 2354 2355 2356
        * obs_var            (obs_var) <U9 72B 'recency' 'frequency'
      Data variables:
          recency_frequency  (customer_id, obs_var) float64 38kB 30.43 2.0 ... 0.0 0.0
      Attributes:
          created_at:                 2025-02-05T16:31:32.210075+00:00
          arviz_version:              0.20.0
          inference_library:          pymc
          inference_library_version:  5.19.1

    • <xarray.Dataset> Size: 94kB
      Dimensions:      (index: 2357)
      Coordinates:
        * index        (index) int64 19kB 0 1 2 3 4 5 ... 2352 2353 2354 2355 2356
      Data variables:
          customer_id  (index) int64 19kB 0 1 2 3 4 5 ... 2352 2353 2354 2355 2356
          frequency    (index) int64 19kB 2 1 0 0 0 7 1 0 2 0 ... 7 1 2 0 0 0 5 0 4 0
          recency      (index) float64 19kB 30.43 1.71 0.0 0.0 ... 24.29 0.0 26.57 0.0
          T            (index) float64 19kB 38.86 38.86 38.86 38.86 ... 27.0 27.0 27.0

We can look into the summary table:

model.fit_summary()
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alpha 6.544 0.733 5.223 7.894 0.016 0.011 2151.0 2838.0 1.0
r 0.571 0.088 0.417 0.732 0.002 0.001 1846.0 2481.0 1.0
phi_dropout 0.372 0.040 0.294 0.446 0.001 0.001 1945.0 2667.0 1.0
kappa_dropout 2.381 0.614 1.391 3.498 0.013 0.009 2173.0 2930.0 1.0
a 0.865 0.144 0.619 1.132 0.003 0.002 3159.0 3932.0 1.0
b 1.515 0.484 0.792 2.443 0.011 0.008 2024.0 2748.0 1.0

We see that the r_hat values are close to \(1\), which indicates convergence.

We can also plot posterior distributions of the parameters and the rank plots:

axes = az.plot_trace(
    data=model.idata,
    compact=True,
    kind="rank_bars",
    backend_kwargs={"figsize": (12, 9), "layout": "constrained"},
)
plt.gcf().suptitle("MBG/NBD Model Trace", fontsize=18, fontweight="bold");
../../_images/f8e4f92e9bffb025da069d565ea7fb03c9d5a936328e959cc3942a3f04711167.png

Using MAP fit#

CLV models such as BetaGeoModel, can provide the maximum a posteriori estimates using a numerical optimizer (L-BFGS-B from scipy.optimize) under the hood.

model_map = clv.ModifiedBetaGeoModel(data=data)
idata_map = model_map.fit(method="map")

idata_map
arviz.InferenceData
    • <xarray.Dataset> Size: 64B
      Dimensions:        (chain: 1, draw: 1)
      Coordinates:
        * chain          (chain) int64 8B 0
        * draw           (draw) int64 8B 0
      Data variables:
          alpha          (chain, draw) float64 8B 6.454
          r              (chain, draw) float64 8B 0.5654
          phi_dropout    (chain, draw) float64 8B 0.3773
          kappa_dropout  (chain, draw) float64 8B 2.177
          a              (chain, draw) float64 8B 0.8215
          b              (chain, draw) float64 8B 1.356
      Attributes:
          created_at:                 2025-02-05T16:31:37.674865+00:00
          arviz_version:              0.20.0
          inference_library:          pymc
          inference_library_version:  5.19.1

    • <xarray.Dataset> Size: 57kB
      Dimensions:            (customer_id: 2357, obs_var: 2)
      Coordinates:
        * customer_id        (customer_id) int64 19kB 0 1 2 3 ... 2353 2354 2355 2356
        * obs_var            (obs_var) <U9 72B 'recency' 'frequency'
      Data variables:
          recency_frequency  (customer_id, obs_var) float64 38kB 30.43 2.0 ... 0.0 0.0
      Attributes:
          created_at:                 2025-02-05T16:31:37.677973+00:00
          arviz_version:              0.20.0
          inference_library:          pymc
          inference_library_version:  5.19.1

    • <xarray.Dataset> Size: 94kB
      Dimensions:      (index: 2357)
      Coordinates:
        * index        (index) int64 19kB 0 1 2 3 4 5 ... 2352 2353 2354 2355 2356
      Data variables:
          customer_id  (index) int64 19kB 0 1 2 3 4 5 ... 2352 2353 2354 2355 2356
          frequency    (index) int64 19kB 2 1 0 0 0 7 1 0 2 0 ... 7 1 2 0 0 0 5 0 4 0
          recency      (index) float64 19kB 30.43 1.71 0.0 0.0 ... 24.29 0.0 26.57 0.0
          T            (index) float64 19kB 38.86 38.86 38.86 38.86 ... 27.0 27.0 27.0

This time we get point estimates for the parameters.

map_summary = model_map.fit_summary()

map_summary
alpha            6.454
r                0.565
phi_dropout      0.377
kappa_dropout    2.177
a                0.821
b                1.356
Name: value, dtype: float64

Comparing with the lifetimes package#

For the sake of comparison, let’s fit the model using the lifetimes package.

mbgf = ModifiedBetaGeoFitter()
mbgf.fit(
    frequency=data["frequency"].values,
    recency=data["recency"].values,
    T=data["T"].values,
)

mbgf.summary
coef se(coef) lower 95% bound upper 95% bound
r 0.524838 0.083264 0.361640 0.688035
alpha 6.182988 0.695597 4.819618 7.546359
a 0.891356 0.154785 0.587977 1.194736
b 1.614000 0.537472 0.560554 2.667446

The r and alpha purchase rate parameters are quite similar for all three models, but the a and b dropout parameters are better approximated with the default parameters when fitted with MCMC.

fig, axes = plt.subplots(
    nrows=2, ncols=2, figsize=(12, 9), sharex=False, sharey=False, layout="constrained"
)

axes = axes.flatten()

for i, var_name in enumerate(["r", "alpha", "a", "b"]):
    ax = axes[i]
    az.plot_posterior(
        model.idata.posterior[var_name].values.flatten(),
        color="C0",
        point_estimate="mean",
        ax=ax,
        label="MCMC",
    )
    ax.axvline(x=map_summary[var_name], color="C1", linestyle="--", label="MAP")
    ax.axvline(
        x=mbgf.summary["coef"][var_name], color="C2", linestyle="--", label="lifetimes"
    )
    ax.legend(loc="upper right")
    ax.set_title(var_name)

plt.gcf().suptitle("MBG/NBD Model Parameters", fontsize=18, fontweight="bold");
../../_images/e9aff41507a5a8f7c04c3818fdbcf4ae5a7f742e30395b50eb10e66ab0ce6fc1.png

Prior and Posterior Predictive Checks#

PPCs allow us to check the efficacy of our priors, and the peformance of the fitted posteriors.

Let’s see how the model performs in a prior predictive check, where we sample from the default priors before fitting the model:

# PPC histogram plot
clv.plot_expected_purchases_ppc(model, ppc="prior");
Sampling: [alpha, kappa_dropout, phi_dropout, r, recency_frequency]
../../_images/4e4c56d0cdeef55a253d7b6ace63e1cc4c724dea7a5eba83fa034e5c00667e9d.png
# PPC histogram plot
clv.plot_expected_purchases_ppc(model, ppc="posterior");
Sampling: [recency_frequency]

../../_images/d71d782fb457a1481c0cfcb0ab8187e501cab749fe17616541464125b872b4ba.png

Some Applications#

Now that you have fitted the model, we can use it to make predictions. For example, we can predict the expected probability of a customer being alive as a function of time (steps). Here is a snippet of code to do that:

Expected Number of Purchases#

Let us take a sample of users:

example_customer_ids = [1, 6, 10, 18, 45, 1412]

data_small = data.query("customer_id.isin(@example_customer_ids)")

data_small.head(6)
customer_id frequency recency T
1 1 1 1.71 38.86
6 6 1 5.00 38.86
10 10 5 24.43 38.86
18 18 3 28.29 38.71
45 45 12 34.43 38.57
1412 1412 14 30.29 31.57

Observe that the last two customers are frequent buyers as compared to the others.

steps = 90

expected_num_purchases_steps = xr.concat(
    objs=[
        model.expected_purchases(
            data=data_small,
            future_t=t,
        )
        for t in progress_bar(range(steps))
    ],
    dim="t",
).transpose(..., "t")
100.00% [90/90 00:01<00:00]

We can plot the expected number of purchases for the next \(90\) periods:

Hide code cell source
fig, axes = plt.subplots(
    nrows=len(example_customer_ids),
    ncols=1,
    figsize=(12, 15),
    sharex=True,
    sharey=True,
    layout="constrained",
)

axes = axes.flatten()

for i, customer_id in enumerate(example_customer_ids):
    ax = axes[i]
    customer_expected_num_purchases_steps = expected_num_purchases_steps.sel(
        customer_id=customer_id
    )
    az.plot_hdi(
        range(steps),
        customer_expected_num_purchases_steps,
        hdi_prob=0.94,
        color="C0",
        fill_kwargs={"alpha": 0.3, "label": "$94 \\%$ HDI"},
        ax=ax,
    )
    az.plot_hdi(
        range(steps),
        customer_expected_num_purchases_steps,
        hdi_prob=0.5,
        color="C0",
        fill_kwargs={"alpha": 0.5, "label": "$50 \\%$ HDI"},
        ax=ax,
    )
    ax.plot(
        range(steps),
        customer_expected_num_purchases_steps.mean(dim=("chain", "draw")),
        color="C0",
        label="posterior mean",
    )
    ax.legend(loc="upper left")
    ax.set(title=f"Customer {customer_id}", xlabel="t", ylabel="purchases")

axes[-1].set(xlabel="steps")
plt.gcf().suptitle("Expected Number of Purchases", fontsize=18, fontweight="bold");
../../_images/d9590a7155cc81502bd86bd4284ef3e7d3fa0a3784cac0ca7f10fe71aafc716d.png

Note that the frequent buyers are expected to make more purchases in the future.

Probability of a Customer Being Alive#

We now look into the probability of a customer being alive for the next \(90\) periods:

steps = 90

future_alive_all = []

for t in progress_bar(range(steps)):
    future_data = data_small.copy()
    future_data["T"] = future_data["T"] + t
    future_alive = model.expected_probability_alive(data=future_data)
    future_alive_all.append(future_alive)

expected_probability_alive_steps = xr.concat(
    objs=future_alive_all,
    dim="t",
).transpose(..., "t")
100.00% [90/90 00:00<00:00]
Hide code cell source
fig, axes = plt.subplots(
    nrows=len(example_customer_ids),
    ncols=1,
    figsize=(12, 15),
    sharex=True,
    sharey=True,
    layout="constrained",
)

axes = axes.flatten()

for i, customer_id in enumerate(example_customer_ids):
    ax = axes[i]
    customer_expected_probability_alive_steps = expected_probability_alive_steps.sel(
        customer_id=customer_id
    )
    az.plot_hdi(
        range(steps),
        customer_expected_probability_alive_steps,
        hdi_prob=0.94,
        color="C1",
        fill_kwargs={"alpha": 0.3, "label": "$94 \\%$ HDI"},
        ax=ax,
    )
    az.plot_hdi(
        range(steps),
        customer_expected_probability_alive_steps,
        hdi_prob=0.5,
        color="C1",
        fill_kwargs={"alpha": 0.5, "label": "$50 \\%$ HDI"},
        ax=ax,
    )
    ax.plot(
        range(steps),
        customer_expected_probability_alive_steps.mean(dim=("chain", "draw")),
        color="C1",
        label="posterior mean",
    )
    ax.legend(loc="upper right")
    ax.set(title=f"Customer {customer_id}", ylabel="probability alive", ylim=(0, 1))

axes[-1].set(xlabel="steps")
plt.gcf().suptitle(
    "Expected Probability Alive over Time", fontsize=18, fontweight="bold"
);
../../_images/a302b152afb2e570c2dc709f0bf2513da8ca568396e60e6b30ec1a20872a3007.png

Tip

Here are some general remarks:

  • These plots assume no future purchases.

  • The decay probability is not the same as it depends in the purchase history of the customer.

  • The probability of being alive is always decreasing as we are assuming there is no change in the other parameters.

  • These probabilities are always non-negative, as expected.

Warning

For the frequent buyers, the probability of being alive drops very fast as we are assuming no future purchases. It is very important to keep this in mind when interpreting the results.

%reload_ext watermark
%watermark -n -u -v -iv -w -p pymc,pytensor
Last updated: Fri Dec 20 2024

Python implementation: CPython
Python version       : 3.10.16
IPython version      : 8.30.0

pymc    : 5.19.1
pytensor: 2.26.4

fastprogress  : 1.0.3
pymc_marketing: 0.10.0
matplotlib    : 3.9.3
pandas        : 2.2.3
arviz         : 0.20.0
lifetimes     : 0.11.3
xarray        : 2024.11.0

Watermark: 2.5.0