Customer Analytics May 4, 2026 Published project
Retail Segmentation & Market Basket Analysis

Customer segmentation and association-rule analytics

This project combines RFM customer segmentation with market basket analysis to connect unsupervised learning outputs to practical customer and product decisions.

PythonPandasScikit-learnK-MeansAprioriAssociation Rules

Challenge

  • Retail transaction data needs cleaning before customer behavior becomes meaningful.
  • Customer segmentation and basket analysis answer different business questions.
  • Analytical outputs need interpretation that maps clusters and rules to practical actions.

System architecture

Transactions
RFM features
Customer clusters
Association rules

Data and inputs

UCI Online Retail data with 541,909 raw rows, 25,900 invoices, 4,372 customers, and 4,070 unique products.

Technical approach

  • Clean missing customer IDs, cancellations, invalid quantities, and invalid prices.
  • Create Recency, Frequency, and Monetary features and compare clustering approaches.
  • Build a France transaction matrix and run Apriori association-rule mining.
  • Interpret segments and rules as retention, reactivation, cross-selling, and loyalty opportunities.

Evaluation and results

Key indicators

541,909 raw rows

Key indicators

3 customer segments

Key indicators

23 final association rules

  • K-Means produced three customer segments with silhouette 0.4599.
  • The final setup identified Regular, Dormant/At-Risk, and VIP/High-Value customers.
  • Market basket analysis produced 23 final rules after support, confidence, and lift filtering.

Implementation and code

Implementation focus

The implementation connects data preparation, modeling, evaluation, and interpretation in a structured workflow that makes the technical decisions clear.

Source code

The code is available for exploring the implementation details and extending the experiment when needed.

Open source code

Scope and responsible use

The project is a focused modeling and evaluation study. Broader use should be supported by validation on additional data, robustness checks, monitoring, and domain-specific evaluation.

Future development

  • Compare additional clustering methods and stability checks.
  • Add cohort analysis and customer lifetime value features.
  • Turn rules into ranked recommendation candidates with clearer business constraints.

Technical contribution

The project connects unsupervised learning with business interpretation across customer behavior and product relationships.