CLV Quickstart#

Customer Lifetime Value (CLV) is the measure of a customer’s contribution over time to a business. This metric is used to inform spending levels on new customer acquisition, retention, and other marketing and sales efforts, so reliable estimation is essential.

PyMC-Marketing provides tools to segment customers on their past behavior (see RFM Segmentation) as well as the following Buy Till You Die (BTYD) probabilistic models to predict future behavior:

  • BG/NBD model for continuous time, non-contractual modeling

  • Pareto/NBD model for continuous time, non-contractual modeling with covariates

  • Shifted BG model for discrete time, contractual modeling

  • BG/BB model for discrete time, contractual modeling

  • Exponential Gamma model for discrete time, contractual modeling (coming soon)

  • Gamma-Gamma model for expected monetary value

  • Modified BG/NBD model, similar to the BG/NBD model, but assumes non-repeat customers are still active.

This table contains a breakdown of the four BTYD modeling domains, and examples for each:

Non-contractual

Contractual

Continuous

online purchases

ad conversion time

Discrete

concerts & sports events

recurring subscriptions

In this notebook we will demonstrate how to estimate future purchasing activity and CLV with the CDNOW dataset, a popular benchmarking dataset in CLV and BTYD research. Data is available here, with additional details here.

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from arviz.labels import MapLabeller

from pymc_marketing import clv
from pymc_marketing.prior import Prior
az.style.use("arviz-darkgrid")

%config InlineBackend.figure_format = "retina" # nice looking plots

1.1 Data Requirements#

For all models, the following nomenclature is used:

  • customer_id represents a unique identifier for each customer.

  • frequency represents the number of repeat purchases that a customer has made, i.e. one less than the total number of purchases.

  • T represents a customer’s “age”, i.e. the duration between a customer’s first purchase and the end of the period of study. In this example notebook, the units of time are in weeks.

  • recency represents the time period when a customer made their most recent purchase. This is equal to the duration between a customer’s first and last purchase. If a customer has made only 1 purchase, their recency is 0.

  • monetary_value represents the average value of a given customer’s repeat purchases. Customers who have only made a single purchase have monetary values of zero.

The rfm_summary function can be used to preprocess raw transaction data for modeling:

raw_trans = pd.read_csv(
    "https://raw.githubusercontent.com/pymc-labs/pymc-marketing/main/data/cdnow_transactions.csv"
)

raw_trans.head(5)
_id id date cds_bought spent
0 4 1 19970101 2 29.33
1 4 1 19970118 2 29.73
2 4 1 19970802 1 14.96
3 4 1 19971212 2 26.48
4 21 2 19970101 3 63.34
rfm_data = clv.utils.rfm_summary(
    raw_trans,
    customer_id_col="id",
    datetime_col="date",
    monetary_value_col="spent",
    datetime_format="%Y%m%d",
    time_unit="W",
)

rfm_data
customer_id frequency recency T monetary_value
0 1 3.0 49.0 78.0 23.723333
1 2 1.0 2.0 78.0 11.770000
2 3 0.0 0.0 78.0 0.000000
3 4 0.0 0.0 78.0 0.000000
4 5 0.0 0.0 78.0 0.000000
... ... ... ... ... ...
2352 2353 2.0 53.0 66.0 19.775000
2353 2354 5.0 24.0 66.0 44.928000
2354 2355 1.0 44.0 66.0 24.600000
2355 2356 6.0 62.0 66.0 31.871667
2356 2357 0.0 0.0 66.0 0.000000

2357 rows × 5 columns

It is important to note these definitions differ from that used in RFM segmentation, where the first purchase is included, T is not used, and recency is the number of time periods since a customer’s most recent purchase.

To visualize data in RFM format, we can plot the recency and T of the customers with the plot_customer_exposure function. We see a large chunk (>60%) of customers haven’t made another purchase in a while.

fig, ax = plt.subplots(figsize=(10, 5))
(
    rfm_data.sample(n=100, random_state=42)
    .sort_values(["recency", "T"])
    .pipe(clv.plot_customer_exposure, ax=ax, linewidth=0.5, size=0.75)
);
../../_images/6ba6f6973897fec715f448fc71383db96b43f4ef0bcc913e845ae6a9e4a90848.png

Predicting Future Purchasing Behavior with the BG/NBD Model#

This dataset is an example of continuous time, noncontractual transactions because it comprises purchases from an online music store. PyMC-Marketing provides two models for this use case:

We will be using the BG/NBD model in this notebook because it converges quickly and works well for basic modeling tasks. However, if a more comprehensive analysis is desired, consider using the Pareto/NBD model instead due to its expanded functionality, including support for covariates.

Let’s take a look at the underlying structure of the BG/NBD model:

bgm = clv.BetaGeoModel(data=rfm_data)

bgm.build_model()
bgm.graphviz()
../../_images/fc90d22ac713f62e25959bc2957c36441d09c8232d67b44b877caafaa4d7113f.svg

The default priors for the r and alpha purchase rate parameters follow a HalfFlat distribution, which is an uninformative positive distribution. The a and b dropout parameters are pooled with the hierarchical kappa_dropout and phi_dropout priors for improved performance.

More informative priors can be specified in a custom model configuration by passing a dictionary with keys for prior names, and values as distributions defined with the Prior class in PyMC-Marketing.

model_config = {
    "a_prior": Prior("HalfNormal", sigma=10),
    "b_prior": Prior("HalfNormal", sigma=10),
    "alpha_prior": Prior("HalfNormal", sigma=10),
    "r_prior": Prior("HalfNormal", sigma=10),
}
bgm = clv.BetaGeoModel(
    data=rfm_data,
    model_config=model_config,
)
bgm.build_model()
bgm
BG/NBD
     alpha ~ HalfNormal(0, 10)
         r ~ HalfNormal(0, 10)
         a ~ HalfNormal(0, 10)
         b ~ HalfNormal(0, 10)
likelihood ~ Potential(f(r, alpha, b, a))
bgm.graphviz()
../../_images/8b93517dda7c6b66786e7ffcec2d09216169711547e3c03dfe22ea55a40e89cd.svg

Having specified the model, we can now fit it.

bgm.fit()
bgm.fit_summary()
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, r, a, b]


Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 8 seconds.
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
a 0.701 0.170 0.427 1.007 0.005 0.003 1418.0 1520.0 1.0
alpha 7.100 0.506 6.149 8.051 0.011 0.008 1978.0 1985.0 1.0
b 3.348 1.068 1.669 5.269 0.030 0.022 1401.0 1603.0 1.0
r 0.276 0.012 0.254 0.299 0.000 0.000 2027.0 2274.0 1.0

We can use ArviZ, a Python library tailored to produce visualizations for Bayesian models, to plot the posterior distribution of each parameter.

az.plot_posterior(bgm.fit_result);
../../_images/97638a63b0c74a0581eda98de465c46186ff1919954ae6bcc7b993c758d841f7.png

1.2.1. Visualizing Predictions over Time#

Let’s evaluate model performance by tracking predictions against historical purchases:

clv.plot_expected_purchases_over_time(
    model=bgm,
    purchase_history=raw_trans,
    datetime_col="date",
    customer_id_col="spent",
    datetime_format="%Y%m%d",
    time_unit="W",
    t=78,
);
../../_images/fdc6a11b8137f28070ad9c1fe10e10044bdd5a42c3afd0fa4cd5b730ccc2aa94.png

There is a wide discrepancy between cumulative actual and predicted purchases! This is a good indicator of extraneous customers and/or date ranges to exclude from model training, and perhaps the need for additional covariates.

Let’s plot incremental purchase dates for more insights:

clv.plot_expected_purchases_over_time(
    model=bgm,
    purchase_history=raw_trans,
    datetime_col="date",
    customer_id_col="spent",
    datetime_format="%Y%m%d",
    time_unit="W",
    t=78,
    set_index_date=True,
    plot_cumulative=False,
);
../../_images/5bcccbcc8edb69cc7add3890a5dbfba86b5a086d53fd097c73bc054f120a4d18.png

There was a large sales bump in the first few months that is biasing model results and should be investigated. However, notice purchases flatline in the following months and the model is still able to capture this trend. Simply excluding data prior to Apr 1997 should improve performance considerably, but for pedagogical purposes we will continue with the tutorial.

Visualizing Prediction Matrices#

clv.plot_frequency_recency_matrix(bgm);
../../_images/08027dec856fdf7d425a9ea49f9905ddf343b1fc2c5fc3f9db587bcaec8a57d7.png

We can see our best customers have been active for over 60 weeks and have made over 20 purchases (bottom-right). Note the “tail” sweeping up towards the upper-left corner - these customers are infrequent and/or may not have purchased recently. What is the probability they are still active?

clv.plot_probability_alive_matrix(bgm)
<Axes: title={'center': 'Probability Customer is Alive,\nby Frequency and Recency of a Customer'}, xlabel="Customer's Historical Frequency", ylabel="Customer's Recency">
../../_images/3240597c979c88e5e96eb4d4586492abafa15992b1a55bc3f6270272126e86ac.png

Note that all non-repeat customers have an alive probability of 1, which is one of the quirks of BetaGeoModel. In many use cases this is still a valid assumption, but if non-repeat customers are a key focus in your use case, you may want to try ParetoNBDModel instead.

Looking at the probability alive matrix, we can infer that customers who have made fewer purchases are less likely to return, and may be worth targeting for retention.

Ranking customers from best to worst#

Having fit the model, we can ask what is the expected number of purchases for our customers over the next 10 time periods. Let’s look at the four more promising customers.

num_purchases = bgm.expected_purchases(future_t=10)

sdata = rfm_data.copy()
sdata["expected_purchases"] = num_purchases.mean(("chain", "draw")).values
sdata.sort_values(by="expected_purchases").tail(4)
customer_id frequency recency T monetary_value future_spend expected_purchases
812 813 30.0 72.0 74.0 35.654000 35.692751 3.441759
1202 1203 32.0 71.0 72.0 47.172187 46.988670 3.812528
156 157 36.0 74.0 77.0 30.603611 30.721913 3.899491
1980 1981 35.0 66.0 68.0 46.748857 46.588182 4.306360

We can also plot credibility intervals for the expected purchases:

ids = [813, 1203, 157, 1981]
ax = az.plot_posterior(num_purchases.sel(customer_id=ids), grid=(2, 2))
for axi, id in zip(ax.ravel(), ids, strict=False):
    axi.set_title(f"Customer: {id}", size=20)
plt.suptitle("Expected Number of Purchase over 10 Time Periods", fontsize=28, y=1.05);
../../_images/f1d1de5cec247b80c1cad5113290df0b2ed59076273897e783dda55ae2929436.png

Predicting purchase behavior of a new customer#

We can use the fitted model to predict the number of purchases for a fresh new customer.

az.plot_posterior(bgm.expected_purchases_new_customer(t=10).sel(customer_id=1))
plt.title("Expected purchases of a new customer in the first 10 periods");
../../_images/b80b155f9f2d62a69ae3cb7c5a0c896af5b978e87eb77a4c993a3442da107031.png

Customer Probability Histories#

Given a customer transaction history, we can calculate their historical probability of being alive, according to our trained model.

Let look at active customer 1516 and assess the change in probability that the user will ever return if they do no other purchases in the next 9 time periods.

customer_1516 = rfm_data.loc[1515]
customer_1516
customer_id       1516.000000
frequency           27.000000
recency             67.000000
T                   70.000000
monetary_value      51.944074
Name: 1515, dtype: float64
customer_1516_history = pd.DataFrame(
    dict(
        customer_id=np.arange(10),
        frequency=np.full(10, customer_1516["frequency"], dtype="int"),
        recency=np.full(10, customer_1516["recency"]),
        T=(np.arange(0, 10) + customer_1516["recency"]).astype("int"),
    )
)
customer_1516_history
customer_id frequency recency T
0 0 27 67.0 67
1 1 27 67.0 68
2 2 27 67.0 69
3 3 27 67.0 70
4 4 27 67.0 71
5 5 27 67.0 72
6 6 27 67.0 73
7 7 27 67.0 74
8 8 27 67.0 75
9 9 27 67.0 76
p_alive = bgm.expected_probability_alive(data=customer_1516_history)
az.plot_hdi(customer_1516_history["T"], p_alive, color="C0")
plt.plot(customer_1516_history["T"], p_alive.mean(("draw", "chain")), marker="o")
plt.axvline(
    customer_1516_history["recency"].iloc[0], c="black", ls="--", label="Purchase"
)

plt.title("Probability Customer 1516 will purchase again")
plt.xlabel("T")
plt.ylabel("p")
plt.legend();
../../_images/dfd033ab1c0eac7d7c3452e9944d71940c3d4d012ac14ca09c8662e9e04f628b.png

We can see that, if no purchases are being made in the next 9 weeks, the model has low confidence that the costumer will ever return. What if they had done one purchase in between?

customer_1516_history.loc[7:, "frequency"] += 1
customer_1516_history.loc[7:, "recency"] = customer_1516_history.loc[7, "T"] - 0.5
customer_1516_history
customer_id frequency recency T
0 0 27 67.0 67
1 1 27 67.0 68
2 2 27 67.0 69
3 3 27 67.0 70
4 4 27 67.0 71
5 5 27 67.0 72
6 6 27 67.0 73
7 7 28 73.5 74
8 8 28 73.5 75
9 9 28 73.5 76
p_alive = bgm.expected_probability_alive(data=customer_1516_history)
az.plot_hdi(customer_1516_history["T"], p_alive, color="C0")
plt.plot(customer_1516_history["T"], p_alive.mean(("draw", "chain")), marker="o")
plt.axvline(
    customer_1516_history["recency"].iloc[0], c="black", ls="--", label="Purchase"
)
plt.axvline(customer_1516_history["recency"].iloc[-1], c="black", ls="--")

plt.title("Probability Customer 1516 will purchase again")
plt.xlabel("T")
plt.ylabel("p")
plt.legend();
../../_images/1ba9842dd2fdb478c8e7bbd6cd718e2c8c043138e6c9b1f5d0cea8ead4cdbc11.png

From the plot above, say that customer 1516 makes a purchase at week 73.5, just over 6 weeks after they have made their last recorded purchase. We can see that the probability of the customer returning quickly goes back up!

Estimating Customer Lifetime Value Using the Gamma-Gamma Model#

Until now we’ve focused mainly on transaction frequencies and probabilities, but to estimate economic value we can use the Gamma-Gamma model.

The Gamma-Gamma model assumes at least 1 repeat transaction has been observed per customer. As such we filter out those with zero repeat purchases.

nonzero_data = rfm_data.query("frequency>0")
nonzero_data
customer_id frequency recency T monetary_value
0 1 3.0 49.0 78.0 23.723333
1 2 1.0 2.0 78.0 11.770000
5 6 14.0 76.0 78.0 76.503571
6 7 1.0 5.0 78.0 11.770000
7 8 1.0 61.0 78.0 26.760000
... ... ... ... ... ...
2351 2352 1.0 47.0 66.0 14.490000
2352 2353 2.0 53.0 66.0 19.775000
2353 2354 5.0 24.0 66.0 44.928000
2354 2355 1.0 44.0 66.0 24.600000
2355 2356 6.0 62.0 66.0 31.871667

1126 rows × 5 columns

If computing the monetary value from your own data, note that it is the mean of a given customer’s value, not the sum. monetary_value can be used to represent profit, or revenue, or any value as long as it is consistently calculated for each customer.

The Gamma-Gamma model relies upon the important assumption there is no relationship between the monetary value and the purchase frequency. In practice we need to check whether the Pearson correlation is less than 0.3:

nonzero_data[["monetary_value", "frequency"]].corr()
monetary_value frequency
monetary_value 1.000000 0.052819
frequency 0.052819 1.000000

Transaction frequencies and monetary values are uncorrelated; we can now fit our Gamma-Gamma model to predict average spend and expected lifetime values of our customers

The Gamma-Gamma model takes in a ‘data’ parameter, a pandas DataFrame with 3 columns representing Customer ID, average spend of repeat purchases, and number of repeat purchase for that customer. As with the BG/NBD model, these parameters are given HalfFlat priors which can be too diffuse for small datasets. For this example, we will use the default priors, but other priors can be specified just like with the BG/NBD example above.

gg = clv.GammaGammaModel(data=nonzero_data)
gg.build_model()
gg
Gamma-Gamma Model (Mean Transactions)
         p ~ HalfFlat()
         q ~ HalfFlat()
         v ~ HalfFlat()
likelihood ~ Potential(f(q, p, v))

By default, fit generates full Bayesian posteriors via MCMC sampling. For extremely large datasets where the full posterior is not needed and/dor MCMC is too slow, the maximum a posteriori fit_method can be used to estimate point estimates for model parameters.

gg.fit(fit_method="map")



arviz.InferenceData
    • <xarray.Dataset> Size: 40B
      Dimensions:  (chain: 1, draw: 1)
      Coordinates:
        * chain    (chain) int64 8B 0
        * draw     (draw) int64 8B 0
      Data variables:
          p        (chain, draw) float64 8B 4.785
          q        (chain, draw) float64 8B 3.882
          v        (chain, draw) float64 8B 22.65
      Attributes:
          created_at:                 2024-12-16T03:19:47.480846+00:00
          arviz_version:              0.18.0
          inference_library:          pymc
          inference_library_version:  5.15.1

    • <xarray.Dataset> Size: 54kB
      Dimensions:         (index: 1126)
      Coordinates:
        * index           (index) int64 9kB 0 1 5 6 7 8 ... 2351 2352 2353 2354 2355
      Data variables:
          customer_id     (index) int64 9kB 1 2 6 7 8 9 ... 2352 2353 2354 2355 2356
          frequency       (index) float64 9kB 3.0 1.0 14.0 1.0 1.0 ... 2.0 5.0 1.0 6.0
          recency         (index) float64 9kB 49.0 2.0 76.0 5.0 ... 24.0 44.0 62.0
          T               (index) float64 9kB 78.0 78.0 78.0 78.0 ... 66.0 66.0 66.0
          monetary_value  (index) float64 9kB 23.72 11.77 76.5 ... 44.93 24.6 31.87

gg.fit_summary()
p     4.785
q     3.882
v    22.653
Name: value, dtype: float64

However, MCMC sampling is recommended to illustrate uncertainty in parameter estimates and create credibility intervals around predictions.

gg.fit();
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [p, q, v]


Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 6 seconds.
gg.fit_summary()
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
p 4.849 0.793 3.473 6.320 0.028 0.020 855.0 1055.0 1.0
q 3.920 0.276 3.405 4.430 0.009 0.006 991.0 1383.0 1.0
v 23.415 5.284 14.156 33.558 0.184 0.130 810.0 1179.0 1.0
az.plot_posterior(gg.fit_result);
../../_images/666519702f2d353096d56806a1b749418d6199508141ca5d669cf4fc0a7f87ae.png

Predicting spend value of customers#

Having fit our model, we can now use it to predict the conditional, expected average lifetime value of our customers, including those with zero repeat purchases.

expected_spend = gg.expected_customer_spend(data=rfm_data)
az.summary(expected_spend.isel(customer_id=range(10)), kind="stats")
mean sd hdi_3% hdi_97%
x[1] 26.087 0.446 25.269 26.917
x[2] 21.558 1.369 18.967 24.032
x[3] 37.590 0.865 36.053 39.237
x[4] 37.590 0.865 36.053 39.237
x[5] 37.590 0.865 36.053 39.237
x[6] 74.847 0.370 74.111 75.468
x[7] 21.558 1.369 18.967 24.032
x[8] 30.861 0.619 29.751 32.080
x[9] 36.425 0.148 36.151 36.704
x[10] 37.590 0.865 36.053 39.237
labeller = MapLabeller(var_name_map={"x": "customer"})
az.plot_forest(
    expected_spend.isel(customer_id=(range(10))), combined=True, labeller=labeller
)
plt.xlabel("Expected mean spend");
../../_images/a0e2d4af9dc2e61878ebf955de0e420e372da983769cc591eff099515752e83e.png

We can also look at the average expected mean spend across all customers

az.summary(expected_spend.mean("customer_id"), kind="stats")
mean sd hdi_3% hdi_97%
x 37.984 0.546 37.006 39.01
az.plot_posterior(expected_spend.mean("customer_id"))
plt.axvline(expected_spend.mean(), color="k", ls="--")
plt.title("Expected mean spend of all customers");
../../_images/48af9f2ef626d0be41cfcca8c46bc4f1e942134c7367138c19efbb8611c901f6.png

Predicting spend value of a new customer#

az.plot_posterior(gg.expected_new_customer_spend())
plt.title("Expected mean spend of a new customer");
../../_images/2e1dd037c6bbeb80debce1614355aff8700c2eb3795a83789ae684111190260f.png

Estimating CLV#

Finally, we can combine the GG with the BG/NBD model to obtain an estimate of the customer lifetime value. This relies on the discounted cash flow model, adjusting for cost of capital.

If computational issues are encountered, use the thin_fit_result method prior to estimating CLV.

bgm.thin_fit_result(keep_every=2)
BG/NBD
         a ~ HalfNormal(0, 10)
         b ~ HalfNormal(0, 10)
     alpha ~ HalfNormal(0, 10)
         r ~ HalfNormal(0, 10)
likelihood ~ Potential(f(r, alpha, b, a))
clv_estimate = gg.expected_customer_lifetime_value(
    transaction_model=bgm,
    data=rfm_data,
    future_t=12,  # months
    discount_rate=0.01,  # monthly discount rate ~ 12.7% annually
    time_unit="W",  # original data is in weeks
)
az.summary(clv_estimate.isel(customer_id=range(10)), kind="stats")
mean sd hdi_3% hdi_97%
x[1] 29.213 1.045 27.298 31.185
x[2] 3.098 0.300 2.531 3.635
x[3] 5.617 0.221 5.191 6.018
x[4] 5.617 0.221 5.191 6.018
x[5] 5.617 0.221 5.191 6.018
x[6] 501.112 16.591 471.640 532.160
x[7] 4.074 0.341 3.455 4.707
x[8] 16.202 0.429 15.433 17.031
x[9] 46.869 1.311 44.489 49.345
x[10] 5.617 0.221 5.191 6.018
az.plot_forest(
    clv_estimate.isel(customer_id=range(10)), combined=True, labeller=labeller
)
plt.xlabel("Expected CLV");
../../_images/d51571e32b1de082f1e617749c0b71227cfb182e8571a0f52f6a6c1b3ea91d31.png

According to our models, customer[6] has a much higher expected CLV. There is also a large variability in this estimate that arises solely from uncertainty in the parameters of the BG/NBD and GG models.

In general, these models tend to induce a strong correlation between expected CLV and uncertainty. This modelling of uncertainty can be very useful when making marketing decisions.

%load_ext watermark
%watermark -n -u -v -iv -w -p pymc,pytensor
The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
Last updated: Wed Nov 27 2024

Python implementation: CPython
Python version       : 3.10.14
IPython version      : 8.22.2

pymc    : 5.15.1
pytensor: 2.22.1

pymc_marketing: 0.10.0
numpy         : 1.26.4
pandas        : 2.2.2
matplotlib    : 3.8.4
arviz         : 0.18.0
pytensor      : 2.22.1

Watermark: 2.4.3