SubQ: The Subquadratic LLM That Promises 12M Token Context at 1,000x Lower Cost
On May 5, 2026, a Miami-based startup called Subquadratic emerged from stealth with a bold claim: their model SubQ is the first large language model built on a truly subquadratic architecture, capable of processing 12 million tokens in a single context window at a fraction of the computational cost of today’s leading models.
The reception was immediate — and sharply divided.
What Is Subquadratic?
Subquadratic is an AI infrastructure company founded in 2026 and headquartered in Miami, Florida. Led by CEO Justin Dangel and CTO Alexander Whedon (formerly Head of GenAI at Meta), the startup raised $29 million in seed funding at a reported $500 million valuation.
The investor lineup reads like a who’s-who of startup royalty: Tinder co-founder Justin Mateen, ex-SoftBank Vision Fund partner Javier Villamizar, and early investors in Anthropic, OpenAI, Stripe, and Brex.
The Core Innovation: Subquadratic Sparse Attention (SSA)
The headline feature is SSA (Subquadratic Sparse Attention), a sparse attention mechanism that fundamentally changes how the model scales with context length.
The Quadratic Problem
Standard Transformer attention scales as O(n²) — double the context, quadruple the compute. At 1 million tokens, attention alone would require trillions of operations. This is why most models cap context at 128K or 200K tokens.
The Subquadratic Solution
SSA replaces the dense attention matrix with a content-dependent sparse selection mechanism. Instead of comparing every token to every other token, the model dynamically selects which tokens to attend to, reducing complexity to near-linear O(n) scaling.
This is different from previous sparse attention approaches (like those used in Mamba, RWKV, or DeepSeek’s sparse attention) because:
- Content-dependent selection — tokens are selected based on relevance, not fixed patterns
- Fully subquadratic — the entire architecture, not just attention, is optimized for linear scaling
- Trainable sparsity — the model learns which relationships matter during pretraining
The result: Subquadratic claims ~1,000x reduction in attention compute at 12M tokens compared to standard Transformer models.
Benchmark Performance
Subquadratic published results on three benchmarks:
| Benchmark | SubQ Score | Comparison |
|---|---|---|
| SWE-Bench Verified | 81.8% | Opus 4.6: 80.8% |
| RULER 128K (long-context retrieval) | 95.0% | Opus 4.6: 94.8% |
| MRCR v2 (1M token retrieval) | 65.9% | GPT-5.5: 74.0%, Gemini 3.1 Pro: 26.3% |
At 1M tokens, SubQ dramatically outperforms Gemini 3.1 Pro (65.9% vs 26.3%) on MRCR v2, though it trails GPT-5.5 (74.0%). The RULER result is particularly impressive — 95% accuracy at 128K context, matching Claude Opus 4.6 at a claimed ~300x cost reduction ($8 vs ~$2,600).
Products: Three Ways to SubQ
Subquadratic launched three products in private beta:
1. SubQ API
OpenAI-compatible API endpoints with a 1 million token production context window. Developers can swap in SubQ with minimal code changes.
2. SubQ Code
A CLI coding agent that loads entire codebases into context. Instead of RAG chunking, SubQ Code can ingest your whole repository and reason over it holistically.
3. SubQ Search
A free long-context research tool — think Perplexity with a million-token memory. Early testers report being able to upload entire books or technical documentation for analysis.
The Cost Argument
Perhaps the most striking claim is economic. Subquadratic’s cost-per-task analysis:
| Task | SubQ | Claude Opus | Cost Ratio |
|---|---|---|---|
| RULER 128K | $8 | ~$2,600 | ~325x cheaper |
| SWE-Bench | ~$0.50 | ~$5 | ~10x cheaper |
| MRCR v2 1M | ~$50 | ~$15,000 (est.) | ~300x cheaper |
If these numbers hold, the implications are enormous: long-context tasks that were economically infeasible (analyzing entire codebases, processing full legal documents, reviewing complete academic papers) become routine.
The Skepticism: Why Researchers Are Demanding Proof
Not everyone is convinced. The AI research community has raised several concerns:
1. No Technical Paper
Subquadratic has not released a peer-reviewed paper or full technical report. The website says “paper coming soon” — a red flag for many researchers.
2. Closed Weights
The model is not open-sourced. Independent verification is impossible without access to weights or a reproducible specification.
3. Narrow Benchmarking
Only three benchmarks were published, all favoring long-context or coding tasks. No results on general reasoning (MMLU, GPQA), math (MATH, GSM8K), or multimodal benchmarks.
4. Research vs. Production Gap
The research configuration scores 83% on MRCR v2, but the production API scores 65.9% — a 17-point gap that raises questions about what’s being benchmarked.
5. Single-Run Results
Published results lack confidence intervals. In ML benchmarking, single runs can be misleading due to variance.
Previous subquadratic attempts (Mamba, RWKV, Hyena, S4) have shown promise at small scales but failed to match Transformer quality at full production scale. The community is waiting to see if SubQ breaks that pattern.
“Subquadratic’s claims are either the most important AI architecture breakthrough since ‘Attention Is All You Need’ — or a well-funded mirage. There’s no in-between.” — AI researcher quoted in VentureBeat coverage
What Is Real (and What’s Not)
Let’s separate confirmed facts from unverified claims:
Confirmed:
- $29M seed funding at ~$500M valuation ✅
- Team includes ex-Meta GenAI head Alexander Whedon ✅
- Company emerged from stealth on May 5, 2026 ✅
- API and products exist in private beta ✅
Unverified:
- 12M token context in production ❌ (research config only)
- 1,000x compute reduction ❌ (no independent audit)
- Benchmark reproducibility ❌ (no paper, no weights)
- Production reliability ❌ (private beta, limited testers)
Roadmap: What’s Next
Subquadratic has an aggressive roadmap:
- Q3 2026: Expanded API access, SDK releases
- Q4 2026: 50 million token context window target
- 2027: Enterprise post-training tools
The company has stated it has no plans to open-source SubQ’s weights, positioning instead as a commercial API provider.
Why This Matters
Even with the skepticism, SubQ represents a meaningful moment in AI development:
-
The quadratic barrier is the last major constraint on Transformer architectures. Whoever cracks subquadratic scaling unlocks fundamentally new use cases.
-
Long context changes everything. At 1M+ tokens, agents can work with entire codebases, legal cases, academic literature, or business documents in one shot — no RAG, no chunking, no lost context.
-
The economics force attention. Even if SubQ delivers 10% of what’s claimed, it would still be cheaper than existing approaches for long-context tasks.
-
Competitive pressure is healthy. Whether SubQ is real or not, the buzz pushes every lab to accelerate their own subquadratic research.
Conclusion
Subquadratic’s SubQ launch is one of the most consequential — and most controversial — AI announcements of 2026. If validated, SSA could fundamentally reshape AI economics, making million-token contexts affordable and ubiquitous. If not, it joins a long list of architectures that couldn’t scale.
Independent verification will come in the months ahead. Until then, SubQ is best approached with both genuine curiosity and healthy skepticism.
What’s certain: the race to subquadratic AI is now officially underway.
References
- VentureBeat: Miami startup Subquadratic claims 1,000x AI efficiency gain
- FelloAI: SubQ Review — The First Subquadratic LLM
- SiliconANGLE: Subquadratic launches with $29M to bring 12M-token context windows
- TokenPost: Subquadratic seed investment, 12M token LLM SubQ
- 虎嗅: Subquadratic secures $29M seed funding
- Habr: LLM with linear complexity and up to 12M token context