Crypto Regime Detection — LSTM + GNN + PPO, 13 Assets by Rushikesh SagarCrypto Regime Detection — LSTM + GNN + PPO, 13 Assets by Rushikesh Sagar

Crypto Regime Detection — LSTM + GNN + PPO, 13 Assets

Rushikesh Sagar

Completed work

Data Scientist

ML Engineer

AI Engineer

GitHub

pandas

PyTorch

Cryptocurrency & Blockchain

Crypto Quant

Two-phase quantitative system | Binance · 13 assets · 2017–2025 Phase 1 proved the problem empirically. Phase 2 builds the solution.

The Central Question

Crypto markets have regimes. In bull markets driven by BTC dominance, diversification is a penalty — BTC buy-and-hold beat every active strategy in 2024 by 33+ percentage points. In consolidation, active strategies win — Inverse-Vol outperformed BTC-only by +4 points in 2025.

The question Phase 1 answered: do ML-based volatility forecasts generate alpha over passive benchmarks? The honest answer: not without regime detection first.

The question Phase 2 answers: can a system detect which regime the market is in — and allocate capital accordingly? Current status: regime detection models trained and validated. PPO meta-model in live paper trading on Binance.

Two Phases at a Glance

Phase 1 Phase 2 Goal Volatility forecasting + backtesting Regime detection + allocation Models GARCH · HAR-V · LightGBM HMM · LSTM × 2 · GNN · PPO Assets BTC · ETH · SOL (3) 13 crypto assets Framework Classical ML PyTorch + Stable-Baselines3 Status ✅ Complete 🟢 Paper trading live Headline LightGBM holdout RMSE 0.027 Vol LSTM 95.78% test accuracy Honest finding BTC +118% beat all active strategies (2024) Spike recall 0% on post-2022 data README → Phase 1 → Phase 2

Full System Architecture

Honest Results — Both Phases

Phase 1

Strategy 2024 2025 Inverse-Vol +84.51% +24.83% Equal-Weight +89.76% — Momentum+IV −0.66% — BTC-Only (benchmark) +118.02% +20.84%

Verdict: BTC buy-and-hold beat every active strategy in 2024 by 33+ points. Active strategies won slightly in the 2025 consolidation phase. Phase 1 is not production-ready without regime conditioning.

Phase 2

Model Test Accuracy Holdout Accuracy Honest Note Vol LSTM 95.78% 94.70% Spike recall 0% — spike regime absent post-2022 Trend LSTM 24.6% 29.4% Regime non-stationarity — train ≠ test regime Cross GNN 93.17% 80.72% Distribution shift — divergent regime vanished 2025 PPO Agent Sortino 0.5052 (val) Sortino −0.1508 Peak at step 80k, policy collapse documented

PPO holdout context: BTC-only holdout Sortino was −0.6887. Equal-weight was −0.7295. PPO lost less than both benchmarks in a down market.

Navigate

📁 Phase 1 README Volatility forecasting · GARCH · HAR-V · LightGBM · backtesting 📁 Phase 2 README Regime detection · LSTM · GNN · PPO · live paper trading

About

Final-year CS student in Pune building toward ML engineering roles at YC-backed startups. This two-phase crypto quant system is the quantitative trading component of a longer-term goal: a solo prop desk, then a quant firm.

Other projects:

🤗 Rushisagar221/dalal-street-financial-llm — Fine-tuned Llama-3.2-3B for Indian equity analysis. Citation rate 0% → 100%.

Dalal Street RAG — Five-stage RAG pipeline for fundamental analysis of 11 Indian companies

Poker PPO Bot — Deployed RL agent · FastAPI · React · three difficulty levels

Crypto Quant · Phase 1 complete · Phase 2 paper trading live · Pune, India

Crypto Phase 2 — Regime Detection & Portfolio Allocation

← Master README · ← Phase 1 Phase 1 showed volatility forecasting alone cannot generate alpha without knowing what regime the market is in. Phase 2 builds the regime detection system and a PPO meta-model that allocates capital based on detected regime. Status: live paper trading on Binance via run_paper_trading.py

Quick Numbers

Model Test Holdout Honest Note Vol LSTM 95.78% 94.70% Spike recall 0% — spike absent post-2022 Trend LSTM 24.6% 29.4% Regime non-stationarity — documented Cross GNN 93.17% 80.72% Distribution shift — divergent vanished 2025 PPO Agent Sortino 0.5052 Sortino −0.1508 Peak step 80k · policy collapse documented

PPO holdout context: BTC-only holdout Sortino −0.6887. Equal-weight −0.7295. PPO lost less than both benchmarks in a down market.

Pipeline Architecture

Project Structure

Four Stages

Stage 1 — Data & Feature Engineering

aggregate_to_daily.py → engineer_features.py → select_features.py

Daily aggregation from 1-minute OHLCV. Three feature groups built independently per regime model to prevent leakage between the three detection tasks.

Feature selection via SHAP importance + correlation analysis. Features with pairwise correlation > 0.85 collapsed to the higher-SHAP representative.

Vol features (5 selected): realized_vol_20d · GARCH_sigma · HAR_daily · HAR_weekly · vol_ratio

Trend features (6 selected): rate_of_change · close_normalized · momentum_50d · return_skewness_30d · trend_strength · volume_trend

Cross features (4 selected): avg_pairwise_corr · BTC_dominance · vol_dispersion · corr_stability

Stage 2 — HMM Regime Labeling

generate_regime_labels.py

GaussianHMM fitted with GMM initialization for stability. Viterbi decoding produces temporally consistent regime sequences. Threshold-based labeling for cross-sectional regime (directly observable, no hidden state needed).

Why GMM initialization: random initialization leads to HMM state collapse. GMM pre-clusters the data so Baum-Welch starts from a good position.

Why Viterbi not posterior: posterior picks the best state per timestep independently. Viterbi picks the best sequence globally — respects transition probabilities, prevents single-day regime flips that create noisy labels.

Stage 3 — Regime Detection (Deep Learning)

train_vol_regime.py · train_trend_regime.py · train_cross_regime.py

Vol and Trend LSTMs: 30-day sliding window sequences. LSTM(64) with ScaledDotProductAttention (learnable query vector, scale = d_k^{-0.5}) feeding into LSTM(32). Last hidden state → Dense classification head.

Attention finding: lag 29 (oldest day) receives disproportionate weight versus the otherwise clean exponential decay. Hypothesis: model compares current vol level to start of current vol cycle to classify regime direction.

Cross GNN: PyTorch Geometric was unavailable. Implemented MeanGraphConv from scratch using batched matrix multiplication — mathematically equivalent to GraphSAGE mean aggregation.

Edge weights: cosine similarity over cross features (proxy for pairwise correlation — actual correlation matrix was not stored).

Stage 4 — PPO Meta-Model & Paper Trading

train_ppo_agent.py · run_paper_trading.py

PPO with clipped surrogate objective. State combines regime probabilities from all three detection models plus current portfolio metrics.

Paper trading config (configs/paper.yaml): Binance testnet · weekly rebalancing · real-time regime detection · risk limits enforced · all trades logged

Key Results & Honest Failures

Vol LSTM — The Spike Ghost

Train accuracy 97.93%. Val accuracy 97.93%. Test 95.78%. Holdout 94.70%. Numbers look excellent. Look at per-class recall:

Spike regime (vol > 100% annualized) only occurred during 2020-2022 COVID crash and 2022 bear market. After that — nothing. Class weight of 120.6 correctly handles training imbalance but cannot help when spike has zero instances in evaluation data.

Lesson: accuracy is the wrong metric for rare-class detection. Report per-class recall. A class weight addresses training imbalance — it cannot address evaluation distribution shift.

Trend LSTM — Regime Non-Stationarity

Confusion matrices showed strong-bull and weak-bull recall near zero. Training data (2020-2022) contained COVID crash + recovery + 2021 bull run. Test data (2024) contained post-FTX consolidation recovery — different momentum structure, different autocorrelation, different skewness profile.

This is regime non-stationarity, not generic overfitting. The distinction matters: generic overfitting → regularize. Regime non-stationarity → the training data did not contain the test regime. No amount of regularization fixes a data coverage problem.

Cross GNN — 2025 Distribution Shift

Holdout confusion matrix: divergent recall = 0.0000. In 2025, crypto markets entered a prolonged convergent phase — all assets moved together. Divergent regime, which existed in training data, essentially vanished from 2025 data.

This is distribution shift, not model failure. The model correctly learned what divergent looks like. The market stopped producing divergent conditions.

PPO — Policy Collapse at Step 80,000

Root causes: no early stopping, reward imbalance (turnover penalty easy to avoid by doing nothing; Sortino reward small and hard to earn). Agent discovered maximum reward = maximum inaction.

ppo_agent_best.zip is the step 80,000 checkpoint. All evaluation results use this checkpoint, not the final checkpoint.

Key Design Decisions

LSTM not Transformer Daily OHLCV data: ~9,648 training sequences (30-day windows). A Transformer with d_model=64, n_heads=4, n_layers=2 has ~130k parameters. Parameters-per-example: ~13 — above the threshold for reliable generalization. LSTM with same hidden dim: ~18k parameters, ~2 per example. LSTM + attention discovered the non-uniform attention pattern (lag 29 anomaly) more reliably than Transformer would on this data size.

MeanGraphConv from scratch torch_geometric unavailable in environment. Read GraphSAGE paper, implemented mean aggregation via batched matrix multiplication. Mathematically equivalent to GraphSAGE mean variant. GNN achieved 93.17% test accuracy on the approximation.

PPO not CFR Counterfactual Regret Minimization is theoretically optimal for the portfolio allocation problem as a sequential game. Estimated runtime on dual-core laptop: 6-12 days per training run. PPO: hours on same hardware. Optimal in theory ≠ optimal given constraints.

Three separate model saves vol_regime_lstm.pt · trend_regime_lstm.pt · cross_regime_gnn.pt Each trains independently with its own feature set and label set. Joint training would create gradient interference — the three tasks have different feature distributions and different class structures.

How to Run

Configs:

Schema & Reports

Technical specifications:

File Description schema/crypto_phase2.yaml Master spec schema/phase_2_structure.md Pipeline structure schema/regime_labeling.yaml HMM spec schema/regime_models.yaml LSTM + GNN spec schema/ppo_meta_model.yaml PPO spec schema/features.yaml Feature definitions schema/risk.yaml Risk management rules docs/decision.md Architecture decisions log

Training and evaluation reports:

File Description reports/vol_regime_accuracy.md Vol LSTM — all splits, confusion matrices reports/trend_regime_accuracy.md Trend LSTM — non-stationarity analysis reports/cross_regime_accuracy.md GNN — distribution shift analysis reports/ppo_agent_performance.md PPO — policy collapse history

About

Final-year CS student in Pune building toward ML engineering roles at YC-backed startups. This regime detection system is the quantitative trading component of a longer-term goal: a solo prop desk, then a quant firm.

Other projects:

🤗 Rushisagar221/dalal-street-financial-llm — Fine-tuned Llama-3.2-3B for Indian equity analysis. Citation rate 0% → 100%.

Dalal Street RAG — Five-stage RAG pipeline · 11 Indian companies · 4,118 chunks · 122 unit tests

Crypto Phase 1 — Volatility forecasting + backtesting (feeds this project)

Poker PPO Bot — Deployed RL agent · FastAPI backend · React frontend

Crypto Phase 2 — Regime Detection & Portfolio Allocation · 13 assets · Paper trading live · Pune, India

Loading this content connects you to GitHub Gist.

GitHub Gist privacy information

Like this project

Completed work

Posted Apr 27, 2026

11.1M-row Binance pipeline, 28 features, LSTM regime classifier, GNN cross-sectional model, PPO allocation agent. Paper trading on Binance.

Likes

Views

Timeline

Jan 1, 2017 - Jan 1, 2025

Clients

Binance