rfm_segments#

pymc_marketing.clv.utils.rfm_segments(transactions, customer_id_col, datetime_col, monetary_value_col, segment_config=None, observation_period_end=None, datetime_format=None, time_unit='D', time_scaler=1, sort_transactions=True)[source]#

Assign customers to segments based on spending behavior derived from RFM scores.

This transforms a DataFrame of transaction data of the form:: customer_id, datetime, monetary_value
to a DataFrame of the form:: customer_id, frequency, recency, monetary_value, rfm_score, segment

Customer purchasing data is aggregated into three variables: recency, frequency, and monetary_value. Quartiles are estimated for each variable, and a three-digit RFM score is then assigned to each customer. For example, a customer with a score of ‘234’ is in the second quartile for recency, third quartile for frequency, and fourth quartile for monetary_value. RFM scores corresponding to segments such as “Top Spender”, “Frequent Buyer”, or “At-Risk” are determined, and customers are then segmented based on their RFM score.

By default, the following segments are created:

“Premium Customer”: Customers in top 2 quartiles for all variables.
“Repeat Customer”: Customers in top 2 quartiles for frequency, and either recency or monetary value.
“Top Spender”: Customers in top 2 quartiles for monetary value, and either frequency or recency.
“At-Risk Customer”: Customers in bottom 2 quartiles for two or more variables.
“Inactive Customer”: Customers in bottom quartile for two or more variables.
Customers with unspecified RFM scores will be assigned to a segment named “Other”.

If an alternative segmentation approach is desired, use rfm_summary(include_first_transaction=True, *args, **kwargs) instead to preprocess data for segmentation. In either case, the returned DataFrame cannot be used for modeling. If assigning model predictions to RFM segments, create a separate DataFrame for modeling and join by Customer ID.

Parameters:

transactionsDataFrame: A Pandas DataFrame containing customer_id_col and datetime_col.
customer_id_colstr: Column in the transactions DataFrame denoting the customer_id.
datetime_colstr: Column in the transactions DataFrame denoting datetimes purchase were made.
monetary_value_colstr: Column in the transactions DataFrame that denotes the monetary value of the transaction.
segment_configdict, optional: Dictionary containing segment names and list of RFM score assignments; key/value pairs should be formatted as {"segment": ['111', '123', '321'], ...}. If not provided, default segment names and definitions are applied.
observation_period_endUnion[str, pandas.Period, datetime, None], optional: A string or datetime to denote the final date of the study. Events after this date are truncated. If not given, defaults to the max of datetime_col.
datetime_formatstr, optional: A string that represents the timestamp format. Useful if Pandas doesn’t recognize the provided format.
time_unitstr, optional: Time granularity for study. Default: ‘D’ for days. Possible values listed here: https://numpy.org/devdocs/reference/arrays.datetime.html#datetime-units
time_scalerint, optional: Default: 1. Scales recency & T to a different time granularity. This is useful for datasets spanning many years, and running predictions in different time scales.
sort_transactionsbool, optional: Default: True If raw data is already sorted in chronological order, set to False to improve computational efficiency.

Returns:

DataFrame: Dataframe containing summarized RFM data, RFM scores, and segment assignments