rfm_segments#

pymc_marketing.clv.utils.rfm_segments(transactions, customer_id_col, datetime_col, monetary_value_col, segment_config=None, observation_period_end=None, datetime_format=None, time_unit='D', time_scaler=1, sort_transactions=True)[source]#

Assign customers to segments based on spending behavior derived from RFM scores.

This transforms a DataFrame of transaction data of the form:

customer_id, datetime, monetary_value

to a DataFrame of the form:

customer_id, frequency, recency, monetary_value, rfm_score, segment

Customer purchasing data is aggregated into three variables: recency, frequency, and monetary_value. Quartiles are estimated for each variable, and a three-digit RFM score is then assigned to each customer. For example, a customer with a score of ‘234’ is in the second quartile for recency, third quartile for frequency, and fourth quartile for monetary_value. RFM scores corresponding to segments such as “Top Spender”, “Frequent Buyer”, or “At-Risk” are determined, and customers are then segmented based on their RFM score.

By default, the following segments are created:
  • “Premium Customer”: Customers in top 2 quartiles for all variables.

  • “Repeat Customer”: Customers in top 2 quartiles for frequency, and either recency or monetary value.

  • “Top Spender”: Customers in top 2 quartiles for monetary value, and either frequency or recency.

  • “At-Risk Customer”: Customers in bottom 2 quartiles for two or more variables.

  • “Inactive Customer”: Customers in bottom quartile for two or more variables.

  • Customers with unspecified RFM scores will be assigned to a segment named “Other”.

If an alternative segmentation approach is desired, use rfm_summary(include_first_transaction=True, *args, **kwargs) instead to preprocess data for segmentation. In either case, the returned DataFrame cannot be used for modeling. If assigning model predictions to RFM segments, create a separate DataFrame for modeling and join by Customer ID.

Parameters:
transactionsDataFrame

A Pandas DataFrame containing customer_id_col and datetime_col.

customer_id_colstr

Column in the transactions DataFrame denoting the customer_id.

datetime_colstr

Column in the transactions DataFrame denoting datetimes purchase were made.

monetary_value_colstr

Column in the transactions DataFrame that denotes the monetary value of the transaction.

segment_configdict, optional

Dictionary containing segment names and list of RFM score assignments; key/value pairs should be formatted as {"segment": ['111', '123', '321'], ...}. If not provided, default segment names and definitions are applied.

observation_period_endUnion[str, pandas.Period, datetime, None], optional

A string or datetime to denote the final date of the study. Events after this date are truncated. If not given, defaults to the max of datetime_col.

datetime_formatstr, optional

A string that represents the timestamp format. Useful if Pandas doesn’t recognize the provided format.

time_unitstr, optional

Time granularity for study. Default: ‘D’ for days. Possible values listed here: https://numpy.org/devdocs/reference/arrays.datetime.html#datetime-units

time_scalerint, optional

Default: 1. Scales recency & T to a different time granularity. This is useful for datasets spanning many years, and running predictions in different time scales.

sort_transactionsbool, optional

Default: True If raw data is already sorted in chronological order, set to False to improve computational efficiency.

Returns:
DataFrame

Dataframe containing summarized RFM data, RFM scores, and segment assignments