rfm_segments#
- pymc_marketing.clv.utils.rfm_segments(transactions, customer_id_col, datetime_col, monetary_value_col, segment_config=None, observation_period_end=None, datetime_format=None, time_unit='D', time_scaler=1, sort_transactions=True)[source]#
Assign customers to segments based on spending behavior derived from RFM scores.
- This transforms a DataFrame of transaction data of the form:
customer_id, datetime, monetary_value
- to a DataFrame of the form:
customer_id, frequency, recency, monetary_value, rfm_score, segment
Customer purchasing data is aggregated into three variables:
recency
,frequency
, andmonetary_value
. Quartiles are estimated for each variable, and a three-digit RFM score is then assigned to each customer. For example, a customer with a score of ‘234’ is in the second quartile forrecency
, third quartile forfrequency
, and fourth quartile formonetary_value
. RFM scores corresponding to segments such as “Top Spender”, “Frequent Buyer”, or “At-Risk” are determined, and customers are then segmented based on their RFM score.- By default, the following segments are created:
“Premium Customer”: Customers in top 2 quartiles for all variables.
“Repeat Customer”: Customers in top 2 quartiles for frequency, and either recency or monetary value.
“Top Spender”: Customers in top 2 quartiles for monetary value, and either frequency or recency.
“At-Risk Customer”: Customers in bottom 2 quartiles for two or more variables.
“Inactive Customer”: Customers in bottom quartile for two or more variables.
Customers with unspecified RFM scores will be assigned to a segment named “Other”.
If an alternative segmentation approach is desired, use
rfm_summary(include_first_transaction=True, *args, **kwargs)
instead to preprocess data for segmentation. In either case, the returned DataFrame cannot be used for modeling. If assigning model predictions to RFM segments, create a separate DataFrame for modeling and join by Customer ID.- Parameters:
- transactions
DataFrame
A Pandas DataFrame containing customer_id_col and datetime_col.
- customer_id_col
str
Column in the transactions DataFrame denoting the customer_id.
- datetime_col
str
Column in the transactions DataFrame denoting datetimes purchase were made.
- monetary_value_col
str
Column in the transactions DataFrame that denotes the monetary value of the transaction.
- segment_config
dict
, optional Dictionary containing segment names and list of RFM score assignments; key/value pairs should be formatted as
{"segment": ['111', '123', '321'], ...}
. If not provided, default segment names and definitions are applied.- observation_period_end
Union
[str
,pandas.Period
,datetime
,None
], optional A string or datetime to denote the final date of the study. Events after this date are truncated. If not given, defaults to the max of datetime_col.
- datetime_format
str
, optional A string that represents the timestamp format. Useful if Pandas doesn’t recognize the provided format.
- time_unit
str
, optional Time granularity for study. Default: ‘D’ for days. Possible values listed here: https://numpy.org/devdocs/reference/arrays.datetime.html#datetime-units
- time_scaler
int
, optional Default: 1. Scales recency & T to a different time granularity. This is useful for datasets spanning many years, and running predictions in different time scales.
- sort_transactionsbool, optional
Default: True If raw data is already sorted in chronological order, set to False to improve computational efficiency.
- transactions
- Returns:
DataFrame
Dataframe containing summarized RFM data, RFM scores, and segment assignments