The transformer’s biggest flaw just got fixed. Every LLM you’ve ever used compares every token toThe transformer’s biggest flaw just got fixed. Every LLM you’ve ever used compares every token to
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started
The transformer’s biggest flaw just got fixed. Every LLM you’ve ever used compares every token to every other token. That’s O(n²) scaling. Double the context, quadruple the compute. It’s why long-context models are slow and expensive, and why they degrade as context grows. SubQ doesn’t do that. It’s the first frontier model built on a fully sub-quadratic sparse-attention architecture. It finds the token relationships that actually matter and skips the rest. At 12 million tokens, this cuts attention compute by almost 1,000x. The numbers: → 12M token context (others advertise 1M, accuracy collapses past 200k) → 98% accuracy at 12M tokens → 52x faster than FlashAttention at 1M → Less than 5% the cost of Opus 4.7 → 97.1% on RULER 128K vs Opus 4.7 at 97.2% → 83 on MRCR v2 vs GPT-5.4 at 39 Well done
Back to feed
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started