rfm_summary#
- pymc_marketing.clv.utils.rfm_summary(transactions, customer_id_col, datetime_col, monetary_value_col=None, datetime_format=None, observation_period_end=None, time_unit='D', time_scaler=1, include_first_transaction=False, sort_transactions=True)[source]#
Summarize transaction data for use in CLV modeling or RFM segmentation.
- This transforms a DataFrame of transaction data of the form:
customer_id, datetime [, monetary_value]
- to a DataFrame for CLV modeling:
customer_id, frequency, recency, T [, monetary_value]
- If the
include_first_transaction = True
argument is specified, a DataFrame for RFM segmentation is returned: customer_id, frequency, recency, monetary_value
This function is not required if using the
clv.rfm_segments
utility.Adapted from lifetimes package CamDavidsonPilon/lifetimes
- Parameters:
- transactions
DataFrame
A Pandas DataFrame containing customer_id_col and datetime_col.
- customer_id_col
str
Column in the transactions DataFrame denoting the customer_id.
- datetime_col
str
Column in the transactions DataFrame denoting datetimes purchase were made.
- monetary_value_col
str
, optional Column in the transactions DataFrame denoting the monetary value of the transaction. Optional; only needed for RFM segmentation and spend estimation models like the Gamma-Gamma model.
- observation_period_end
Union
[str
,pandas.Period
,datetime
], optional A string or datetime to denote the final date of the study. Events after this date are truncated. If not given, defaults to the max ‘datetime_col’.
- datetime_format
str
, optional A string that represents the timestamp format. Useful if Pandas doesn’t recognize the provided format.
- time_unit
str
, optional Time granularity for study. Default: ‘D’ for days. Possible values listed here: https://numpy.org/devdocs/reference/arrays.datetime.html#datetime-units
- time_scaler
int
, optional Default: 1. Scales recency & T to a different time granularity. This is useful for datasets spanning many years, and running predictions in different time scales.
- datetime_format
str
, optional A string that represents the timestamp format. Useful if Pandas doesn’t recognize the provided format.
- monetary_value_col
str
, optional Column in the transactions DataFrame that denotes the monetary value of the transaction. Optional; only needed for spend estimation models like the Gamma-Gamma model.
- include_first_transactionbool, optional
Default: False For predictive CLV modeling, this should be False. Set to True if performing RFM segmentation.
- sort_transactionsbool, optional
Default: True If raw data is already sorted in chronological order, set to False to improve computational efficiency.
- transactions
- Returns:
DataFrame
Dataframe containing summarized RFM data, and test columns for frequency, T, and monetary_value if specified