CLV Quickstart#

Note

This quickstart was adapted from the legacy lifetimes library. Some pictures and descriptions are directly attributable to the lifetimes developers.

In this notebook we will be using the CDNOW dataset, a popular benchmarking dataset in CLV and Buy Till You Die (BTYD) modeling research. Data is available here, with additional details [here].

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from arviz.labels import MapLabeller

from pymc_marketing import clv

az.style.use("arviz-darkgrid")

%config InlineBackend.figure_format = "retina" # nice looking plots

1.1 Data Requirements#

For all models, the following nomenclature is used:

customer_id represents a unique identifier for each customer.
frequency represents the number of repeat purchases that a customer has made, i.e. one less than the total number of purchases.
T represents a customer’s “age”, i.e. the duration between a customer’s first purchase and the end of the period of study. In this example notebook, the units of time are in weeks.
recency represents the timepoint when a customer made their most recent purchase. This is also equal to the duration between a customer’s first non-repeat purchase (usually time 0) and last purchase. If a customer has made only 1 purchase, their recency is 0;
monetary_value represents the average value of a given customer’s repeat purchases. Customers who have only made a single purchase have monetary values of zero.

If working with raw transaction data, the rfm_summary function can be used to preprocess data for modeling:

raw_trans = pd.read_csv("https://raw.githubusercontent.com/pymc-labs/pymc-marketing/main/data/cdnow_transactions.csv")

raw_trans.head(5)

	_id	id	date	cds_bought	spent
0	4	1	19970101	2	29.33
1	4	1	19970118	2	29.73
2	4	1	19970802	1	14.96
3	4	1	19971212	2	26.48
4	21	2	19970101	3	63.34

rfm_data = clv.utils.rfm_summary(
    raw_trans,
    customer_id_col = "id",
    datetime_col = "date",
    monetary_value_col = "spent",
    datetime_format = "%Y%m%d",
    time_unit = "W")

rfm_data

	customer_id	frequency	recency	T	monetary_value
0	1	3.0	49.0	78.0	23.723333
1	2	1.0	2.0	78.0	11.770000
2	3	0.0	0.0	78.0	0.000000
3	4	0.0	0.0	78.0	0.000000
4	5	0.0	0.0	78.0	0.000000
...	...	...	...	...	...
2352	2353	2.0	53.0	66.0	19.775000
2353	2354	5.0	24.0	66.0	44.928000
2354	2355	1.0	44.0	66.0	24.600000
2355	2356	6.0	62.0	66.0	31.871667
2356	2357	0.0	0.0	66.0	0.000000

2357 rows × 5 columns

To visualize data in RFM format, we can plot the recency and T of the customers with the plot_customer_exposure function. We see a large chunk (>60%) of customers haven’t made another purchase in a while.

fig, ax = plt.subplots(figsize=(10, 5))
(
    rfm_data
    .sample(n=100, random_state=42)
    .sort_values(["recency", "T"])
    .pipe(clv.plot_customer_exposure, ax=ax, linewidth=0.5, size=0.75)
);

../../_images/f58007f87c1e4a3145cca833a29bf0ff34385db181d91a1d9273ba7e7a50b7c8.png

Basic Frequency/Recency analysis using the BG/NBD model#

As this dataset represents non-contractual transactions in the continuous time domain, we will use the Beta-Geometric/Negative Binomial Distribution (BG/NBD) model to infer the frequency of repeat purchases for each customer in the dataset. The Pareto/Negative Binomial Distribution (Pareto/NBD) model is also available as an alternative if you wish to compare results.

For use cases involving discrete time, contractual transactions, use the Shifted Beta-Geometric model.

bgm = clv.BetaGeoModel(
    data = rfm_data
)
bgm.build_model()

This model has 4 parameters that specify the global frequency and dropout rates of customers.

bgm

BG/NBD
         a ~ HalfFlat()
         b ~ HalfFlat()
     alpha ~ HalfFlat()
         r ~ HalfFlat()
likelihood ~ Potential(f(r, alpha, b, a))

The default priors for the 4 parameters follow a HalfFlat distribution, which is an improper positive uniform distribution. For small datasets this prior can yield implausible posteriors. To avoid this problem, more informative priors can be specified by defining custom PyMC distributions.

Here, we will replace the HalfFlat default by more well-behaved HalfNormal priors with a standard deviation of 10. Customization priors is possible by passing a dictionary with keys being the name of the prior, and values being a dictionary with 2 keys: ‘dist’ representing the name of PyMC distribution and ‘kwargs’ that holds an optional dictionary of all parameters we wish to pass to the distribution

model_config = {
    'a_prior': {'dist': 'HalfNormal',
                'kwargs': {'sigma': 10}},
    'b_prior': {'dist': 'HalfNormal',
                'kwargs': {'sigma': 10}},
    'alpha_prior': {'dist': 'HalfNormal',
                'kwargs': {'sigma': 10}},
    'r_prior': {'dist': 'HalfNormal',
                'kwargs': {'sigma': 10}},
}

bgm = clv.BetaGeoModel(
    data = rfm_data,
    model_config = model_config,
)
bgm.build_model()
bgm

BG/NBD
         a ~ HalfNormal(0, 10)
         b ~ HalfNormal(0, 10)
     alpha ~ HalfNormal(0, 10)
         r ~ HalfNormal(0, 10)
likelihood ~ Potential(f(r, alpha, b, a))

Having specified the model, we can now fit it.

bgm.fit()
bgm.fit_summary()

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [a, b, alpha, r]

100.00% [8000/8000 00:07<00:00 Sampling 4 chains, 0 divergences]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 8 seconds.

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
a	0.695	0.163	0.409	0.975	0.004	0.003	1642.0	1782.0	1.0
b	3.312	1.016	1.652	5.116	0.026	0.019	1646.0	1700.0	1.0
alpha	7.093	0.513	6.129	8.066	0.013	0.009	1612.0	1745.0	1.0
r	0.276	0.012	0.255	0.300	0.000	0.000	1618.0	1689.0	1.0

We can use ArviZ, a Python library tailored to produce visualizations for Bayesian models, to plot the posterior distribution of each parameter.

az.plot_posterior(bgm.fit_result);

../../_images/7dec8a45fc12523969dad92259b7f3a051aeb1d92d3f39b7a5f35bc7740f0958.png

Visualizing the Frequency/Recency Matrix#

clv.plot_frequency_recency_matrix(bgm);

../../_images/166d88a7440b05398c44dffc22873e2688c6bdb48e87068df5e86d1c3821d50f.png

We can see our best customers have been active for over 60 weeks and have made over 20 purchases (bottom-right). Note the “tail” sweeping up towards the upper-left corner - these customers are infrequent and/or may not have purchased recently. What is the probability they are still active?

clv.plot_probability_alive_matrix(bgm)

<Axes: title={'center': 'Probability Customer is Alive,\nby Frequency and Recency of a Customer'}, xlabel="Customer's Historical Frequency", ylabel="Customer's Recency">

../../_images/98a48068b777f17a2ea468519e60c6970a7a86e8732eac9c8a72565f0ab568af.png

Note that all non-repeat customers have an alive probability of 1, which is one of the quirks of BetaGeoModel. In many use cases this is still a valid assumption, but if non-repeat customers are a key focus in your use case, you may want to try ParetoNBDModel instead.

Looking at the probability alive matrix, we can infer that customers who have made fewer purchases are less likely to return, and may be worth targeting for retention.

Ranking customers from best to worst#

Having fit the model, we can ask what is the expected number of purchases for our customers in the next period. Let’s look at the four more promising customers.

num_purchases = bgm.expected_num_purchases(
    customer_id=rfm_data["customer_id"],
    t=1,
    frequency=rfm_data["frequency"],
    recency=rfm_data["recency"],
    T=rfm_data["T"]
)

sdata = rfm_data.copy()
sdata["expected_purchases"] = num_purchases.mean(("chain", "draw")).values
sdata.sort_values(by="expected_purchases").tail(4)

	customer_id	frequency	recency	T	monetary_value	expected_purchases
812	813	30.0	72.0	74.0	35.654000	0.355740
1202	1203	32.0	71.0	72.0	47.172187	0.394494
156	157	36.0	74.0	77.0	30.603611	0.402683
1980	1981	35.0	66.0	68.0	46.748857	0.446398

We can plot the uncertainty in the expected number of purchases in the next period.

ids = [841, 1981, 157, 1516]
ax = az.plot_posterior(num_purchases.sel(customer_id=ids), grid=(2, 2))
for axi, id in zip(ax.ravel(), ids, strict=False):
    axi.set_title(f"Customer: {id}", size=20)
plt.suptitle("Expected number purchases in the next period", fontsize=28, y=1.05);

../../_images/eb1f9e7a031a3757e4f31082d1881fb9ecd85a6d9d6ba601ffec04167960127f.png

Predicting purchase behavior of a new customer#

We can use the fitted model to predict the number of purchases for a fresh new customer.

az.plot_posterior(
    bgm.expected_num_purchases_new_customer(t=10)
)
plt.title("Expected purchases of a new customer in the first 10 periods");

../../_images/625806681aaeddb29b8d5740819b76e966fc67c16e42095ff50e2a7e15774d51.png

Customer Probability Histories#

Given a customer transaction history, we can calculate their historical probability of being alive, according to our trained model.

Let look at active customer 1516 and assess the change in probability that the user will ever return if they do no other purchases in the next 9 time periods.

customer_1516 = rfm_data.loc[1515]
customer_1516

customer_id       1516.000000
frequency           27.000000
recency             67.000000
T                   70.000000
monetary_value      51.944074
Name: 1515, dtype: float64

customer_1516_history = pd.DataFrame(dict(
    ID=np.full(10, 1515, dtype="int"),
    frequency=np.full(10, customer_1516["frequency"], dtype="int"),
    recency=np.full(10, customer_1516["recency"]),
    T=(np.arange(0, 10) + customer_1516["recency"]).astype("int"),

))
customer_1516_history

	ID	frequency	recency	T
0	1515	27	67.0	67
1	1515	27	67.0	68
2	1515	27	67.0	69
3	1515	27	67.0	70
4	1515	27	67.0	71
5	1515	27	67.0	72
6	1515	27	67.0	73
7	1515	27	67.0	74
8	1515	27	67.0	75
9	1515	27	67.0	76

p_alive = bgm.expected_probability_alive(
    customer_id=customer_1516_history["ID"],
    frequency=customer_1516_history["frequency"],
    recency=customer_1516_history["recency"],
    T=customer_1516_history["T"],
)

az.plot_hdi(customer_1516_history["T"], p_alive, color="C0")
plt.plot(customer_1516_history["T"], p_alive.mean(("draw", "chain")), marker="o")
plt.axvline(customer_1516_history["recency"].iloc[0], c="black", ls="--", label="Purchase")

plt.title("Probability Customer 1516 will purchase again")
plt.xlabel("T")
plt.ylabel("p")
plt.legend();

../../_images/824553a0a058d127e552a60c1ea26e67c295608b96c1126088d64c33f18b473d.png

We can see that, if no purchases are being made in the next 9 weeks, the model has low confidence that the costumer will ever return. What if they had done one purchase in between?

customer_1516_history.loc[7:,"frequency"] += 1
customer_1516_history.loc[7:,"recency"] = customer_1516_history.loc[7,"T"] - 0.5
customer_1516_history

	ID	frequency	recency	T
0	1515	27	67.0	67
1	1515	27	67.0	68
2	1515	27	67.0	69
3	1515	27	67.0	70
4	1515	27	67.0	71
5	1515	27	67.0	72
6	1515	27	67.0	73
7	1515	28	73.5	74
8	1515	28	73.5	75
9	1515	28	73.5	76

p_alive = bgm.expected_probability_alive(
    customer_id=customer_1516_history["ID"],
    frequency=customer_1516_history["frequency"],
    recency=customer_1516_history["recency"],
    T=customer_1516_history["T"],
)

az.plot_hdi(customer_1516_history["T"], p_alive, color="C0")
plt.plot(customer_1516_history["T"], p_alive.mean(("draw", "chain")), marker="o")
plt.axvline(customer_1516_history["recency"].iloc[0], c="black", ls="--", label="Purchase")
plt.axvline(customer_1516_history["recency"].iloc[-1], c="black", ls="--")

plt.title("Probability Customer 1516 will purchase again")
plt.xlabel("T")
plt.ylabel("p")
plt.legend();

../../_images/908a8f4e0e272577287684215da3b5d4ee3fc1704c09dbe9a0c7188c1ac68da4.png

From the plot above, say that customer 1516 makes a purchase at week 73.5, just over 6 weeks after they have made their last recorded purchase. We can see that the probability of the customer returning quickly goes back up!

Estimating Customer Lifetime Value Using the Gamma-Gamma Model#

Until now we’ve focused mainly on transaction frequencies and probabilities, but to estimate economic value we can use the Gamma-Gamma model.

The Gamma-Gamma model assumes at least 1 repeat transaction has been observed per customer. As such we filter out those with zero repeat purchases.

nonzero_data = rfm_data.query("frequency>0")
nonzero_data

	customer_id	frequency	recency	T	monetary_value
0	1	3.0	49.0	78.0	23.723333
1	2	1.0	2.0	78.0	11.770000
5	6	14.0	76.0	78.0	76.503571
6	7	1.0	5.0	78.0	11.770000
7	8	1.0	61.0	78.0	26.760000
...	...	...	...	...	...
2351	2352	1.0	47.0	66.0	14.490000
2352	2353	2.0	53.0	66.0	19.775000
2353	2354	5.0	24.0	66.0	44.928000
2354	2355	1.0	44.0	66.0	24.600000
2355	2356	6.0	62.0	66.0	31.871667

1126 rows × 5 columns

If computing the monetary value from your own data, note that it is the mean of a given customer’s value, not the sum. monetary_value can be used to represent profit, or revenue, or any value as long as it is consistently calculated for each customer.

The Gamma-Gamma model relies upon the important assumption there is no relationship between the monetary value and the purchase frequency. In practice we need to check whether the Pearson correlation is less than 0.3:

nonzero_data[['monetary_value', 'frequency']].corr()

	monetary_value	frequency
monetary_value	1.000000	0.052819
frequency	0.052819	1.000000

Transaction frequencies and monetary values are uncorrelated; we can now fit our Gamma-Gamma model to predict average spend and expected lifetime values of our customers

The Gamma-Gamma model takes in a ‘data’ parameter, a pandas DataFrame with 3 columns representing Customer ID, average spend of repeat purchases, and number of repeat purchase for that customer. As with the BG/NBD model, these parameters are given HalfFlat priors which can be too diffuse for small datasets. For this example, we will use the default priors, but other priors can be specified just like with the BG/NBD example above.

dataset = pd.DataFrame({
    'customer_id': nonzero_data.index,
    'mean_transaction_value': nonzero_data["monetary_value"],
    'frequency': nonzero_data["frequency"],
})

gg = clv.GammaGammaModel(
    data = dataset
)
gg.build_model()
gg

Gamma-Gamma Model (Mean Transactions)
         p ~ HalfFlat()
         q ~ HalfFlat()
         v ~ HalfFlat()
likelihood ~ Potential(f(q, p, v))

By default, fit approximates full Bayesian posterior using MCMC sampling provided by pymc.sample. If the full posterior is not needed or MCMC sampling is too slow, users can obtain the single maximum a posteriori estimate via the fit_method kwarg.

gg.fit(fit_method="map");

100.00% [24/24 00:00<00:00 logp = -4,899.8, ||grad|| = 17.939]

gg.fit_summary()

p     4.785
q     3.882
v    22.653
Name: value, dtype: float64

gg.fit();

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [p, q, v]

100.00% [8000/8000 00:04<00:00 Sampling 4 chains, 0 divergences]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 5 seconds.

gg.fit_summary()

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
p	4.815	0.765	3.542	6.225	0.026	0.019	857.0	1022.0	1.0
q	3.926	0.280	3.407	4.437	0.010	0.007	772.0	1087.0	1.0
v	23.605	5.230	14.497	33.175	0.188	0.133	762.0	945.0	1.0

az.plot_posterior(gg.fit_result);

../../_images/a39f49e7f64280e9dedb1df60867ba8d50f87d1327504ed111595fea2b2da78f.png

Predicting spend value of customers#

Having fit our model, we can now use it to predict the conditional, expected average lifetime value of our customers, including those with zero repeat purchases.

expected_spend = gg.expected_customer_spend(
    customer_id=rfm_data.index,
    mean_transaction_value=rfm_data["monetary_value"],
    frequency=rfm_data["frequency"],
)

az.summary(expected_spend.isel(customer_id=range(10)), kind="stats")

	mean	sd	hdi_3%	hdi_97%
x[0]	26.106	0.442	25.342	26.950
x[1]	21.616	1.348	19.113	24.008
x[2]	37.615	0.901	35.848	39.215
x[3]	37.615	0.901	35.848	39.215
x[4]	37.615	0.901	35.848	39.215
x[5]	74.836	0.366	74.151	75.475
x[6]	21.616	1.348	19.113	24.008
x[7]	30.890	0.617	29.705	31.974
x[8]	36.430	0.155	36.148	36.732
x[9]	37.615	0.901	35.848	39.215

labeller = MapLabeller(var_name_map={"x": "customer"})
az.plot_forest(expected_spend.isel(customer_id=(range(10))), combined=True, labeller=labeller)
plt.xlabel("Expected mean spend");

../../_images/ff05748b5e60549b45b7b9c0b877f4f6d94544a98fe5cfdb037005267e697477.png

We can also look at the average expected mean spend across all customers

az.summary(expected_spend.mean("customer_id"), kind="stats")

	mean	sd	hdi_3%	hdi_97%
x	38.0	0.568	36.898	39.033

az.plot_posterior(expected_spend.mean("customer_id"))
plt.axvline(expected_spend.mean(), color="k", ls="--")
plt.title("Expected mean spend of all customers");

../../_images/9477c43d28ee53da35e63100b7c9d1e5381f949fdaffdbe24335bd5acbd16ef7.png

Predicting spend value of a new customer#

az.plot_posterior(
    gg.expected_new_customer_spend()
)
plt.title("Expected mean spend of a new customer");

../../_images/6c213bd9e68b5389aa9a104f5e4ec0b3dc8723b94c34c0e8f311c40be2a8e320.png

Estimating CLV#

Finally, we can combine the GG with the BG/NBD model to obtain an estimate of the customer lifetime value. This relies on the Discounted cash flow model, adjusting for cost of capital:

clv_estimate = gg.expected_customer_lifetime_value(
    transaction_model=bgm,
    customer_id=rfm_data.index,
    mean_transaction_value=rfm_data["monetary_value"],
    frequency=rfm_data["frequency"],
    recency=rfm_data["recency"],
    T=rfm_data["T"],
    time=12, # months
    discount_rate=0.01, # monthly discount rate ~ 12.7% annually
    freq="W", # Our original data is in weeks
)

az.summary(clv_estimate.isel(customer_id=range(10)), kind="stats")

	mean	sd	hdi_3%	hdi_97%
x[0]	29.210	1.147	27.022	31.329
x[1]	3.107	0.365	2.419	3.767
x[2]	5.616	0.259	5.129	6.097
x[3]	5.616	0.259	5.129	6.097
x[4]	5.616	0.259	5.129	6.097
x[5]	500.478	17.735	466.030	531.913
x[6]	4.086	0.435	3.244	4.867
x[7]	16.218	0.542	15.173	17.193
x[8]	46.846	1.349	44.380	49.410
x[9]	5.616	0.259	5.129	6.097

az.plot_forest(clv_estimate.isel(customer_id=range(10)), combined=True, labeller=labeller)
plt.xlabel("Expected CLV");

../../_images/7b3f79fca0d728716c5f3a6cf8d700e821f8a5d1672833b7582e6a44bea926ca.png

According to our models, customer[5] has a much higher expected CLV. There is also a large variability in this estimate that arises solely from uncertainty in the parameters of the BG/NBD and GG models.

In general, these models tend to induce a strong correlation between expected CLV and uncertainty. This modelling of uncertainty can be very useful when making marketing decisions.