rfm_summary#

pymc_marketing.clv.utils.rfm_summary(transactions, customer_id_col, datetime_col, monetary_value_col=None, datetime_format=None, observation_period_end=None, time_unit='D', time_scaler=1, include_first_transaction=False, sort_transactions=True)[source]#

Summarize transaction data for use in CLV modeling and/or RFM segmentation.

This transforms a DataFrame of transaction data of the form:

customer_id, datetime [, monetary_value]

to a DataFrame of the form:

customer_id, frequency, recency, T [, monetary_value]

Adapted from lifetimes package CamDavidsonPilon/lifetimes

Parameters:
  • transactions (DataFrame) – A Pandas DataFrame that contains the customer_id col and the datetime col.

  • customer_id_col (string) – Column in the transactions DataFrame that denotes the customer_id.

  • datetime_col (string) – Column in the transactions DataFrame that denotes the datetime the purchase was made.

  • monetary_value_col (string, optional) – Column in the transactions DataFrame that denotes the monetary value of the transaction. Optional; only needed for spend estimation models like the Gamma-Gamma model.

  • observation_period_end (Union[str, pd.Period, datetime], optional) – A string or datetime to denote the final date of the study. Events after this date are truncated. If not given, defaults to the max ‘datetime_col’.

  • datetime_format (string, optional) – A string that represents the timestamp format. Useful if Pandas can’t understand the provided format.

  • time_unit (string, optional) – Time granularity for study. Default: ‘D’ for days. Possible values listed here: https://numpy.org/devdocs/reference/arrays.datetime.html#datetime-units

  • time_scaler (int, optional) – Default: 1. Useful for scaling recency & T to a different time granularity. Example: With freq=’D’ and freq_multiplier=1, we get recency=591 and T=632 With freq=’h’ and freq_multiplier=24, we get recency=590.125 and T=631.375 This is useful if predictions in a different time granularity are desired, and can also help with model convergence for study periods of many years.

  • include_first_transaction (bool, optional) – Default: False For predictive CLV modeling, this should be False. Set to True if performing RFM segmentation.

  • sort_transactions (bool, optional) – Default: True If raw data is already sorted in chronological order, set to False to improve computational efficiency.

Returns:

customer_id, frequency, recency, T [, monetary_value]

Return type:

obj: DataFrame: