rfm_summary#

pymc_marketing.clv.utils.rfm_summary(transactions, customer_id_col, datetime_col, monetary_value_col=None, datetime_format=None, observation_period_end=None, time_unit='D', time_scaler=1, include_first_transaction=False, sort_transactions=True)[source]#

Summarize transaction data for use in CLV modeling or RFM segmentation.

This transforms a DataFrame of transaction data of the form:

customer_id, datetime [, monetary_value]

to a DataFrame for CLV modeling:

customer_id, frequency, recency, T [, monetary_value]

If the include_first_transaction = True argument is specified, a DataFrame for RFM segmentation is returned:

customer_id, frequency, recency, monetary_value

This function is not required if using the clv.rfm_segments utility.

Adapted from lifetimes package CamDavidsonPilon/lifetimes

Parameters:
transactionsDataFrame

A Pandas DataFrame containing customer_id_col and datetime_col.

customer_id_colstr

Column in the transactions DataFrame denoting the customer_id.

datetime_colstr

Column in the transactions DataFrame denoting datetimes purchase were made.

monetary_value_colstr, optional

Column in the transactions DataFrame denoting the monetary value of the transaction. Optional; only needed for RFM segmentation and spend estimation models like the Gamma-Gamma model.

observation_period_endUnion[str, pandas.Period, datetime], optional

A string or datetime to denote the final date of the study. Events after this date are truncated. If not given, defaults to the max ‘datetime_col’.

datetime_formatstr, optional

A string that represents the timestamp format. Useful if Pandas doesn’t recognize the provided format.

time_unitstr, optional

Time granularity for study. Default: ‘D’ for days. Possible values listed here: https://numpy.org/devdocs/reference/arrays.datetime.html#datetime-units

time_scalerint, optional

Default: 1. Scales recency & T to a different time granularity. This is useful for datasets spanning many years, and running predictions in different time scales.

datetime_formatstr, optional

A string that represents the timestamp format. Useful if Pandas doesn’t recognize the provided format.

monetary_value_colstr, optional

Column in the transactions DataFrame that denotes the monetary value of the transaction. Optional; only needed for spend estimation models like the Gamma-Gamma model.

include_first_transactionbool, optional

Default: False For predictive CLV modeling, this should be False. Set to True if performing RFM segmentation.

sort_transactionsbool, optional

Default: True If raw data is already sorted in chronological order, set to False to improve computational efficiency.

Returns:
DataFrame

Dataframe containing summarized RFM data, and test columns for frequency, T, and monetary_value if specified