S&P 500 Market Regime Classification

Financial ML & Market Analytics Mar 31, 2026 Published project

Short-term market-regime classification experiment

This project frames S&P 500 behavior as a three-class classification problem and tests whether technical indicators can predict next-day market regimes.

View source code Back to Projects

PythonyfinanceScikit-learnLogistic RegressionRandom ForestSVM

Share project

Challenge

Short-horizon financial classification is noisy and often dominated by majority classes.
A model may produce acceptable accuracy while failing to separate Bullish and Bearish regimes.
Time-series validation is needed to avoid misleading evaluation.

System architecture

S&P 500 data

Technical indicators

Classifiers

Regime label

Data and inputs

S&P 500 index data from 2020-01-01 to 2024-01-01 with daily return, SMA_20, SMA_50, and 20-day rolling volatility features.

Technical approach

Label next-day returns as Bullish, Sideways, or Bearish using ±0.5% thresholds.
Compare Logistic Regression, Random Forest, and SVM.
Use tuned versions with time-series-aware validation.
Interpret both accuracy and balanced accuracy.

Evaluation and results

Key indicators

956 modeling observations

Key indicators

3 market regimes

Key indicators

Best accuracy 52.08%

The best accuracy was about 52.08%, while balanced accuracy stayed near one-third.
Models leaned toward the majority Sideways regime.
The result shows that simple indicators may structure analysis but remain weak for next-day regime prediction.

Implementation and code

Implementation focus

The implementation connects data preparation, modeling, evaluation, and interpretation in a structured workflow that makes the technical decisions clear.

Source code

The code is available for exploring the implementation details and extending the experiment when needed.

Open source code

Scope and responsible use

The analysis is intended for modeling and evaluation, not investment advice. Real trading use would require risk controls, transaction-cost modeling, out-of-sample validation, and continuous monitoring.

Future development

Add richer macro and volatility features.
Test longer horizons and different labeling thresholds.
Use probability calibration and uncertainty-aware reporting.

Technical contribution

The project shows disciplined interpretation of weak financial ML signals instead of overstating noisy classification results.