SubQ: The Subquadratic LLM That Promises 12M Token Context at 1,000x Lower Cost

Subquadratic SubQ LLM

On May 5, 2026, a Miami-based startup called Subquadratic emerged from stealth with a bold claim: their model SubQ is the first large language model built on a truly subquadratic architecture, capable of processing 12 million tokens in a single context window at a fraction of the computational cost of today’s leading models.

The reception was immediate — and sharply divided.

What Is Subquadratic?

Subquadratic is an AI infrastructure company founded in 2026 and headquartered in Miami, Florida. Led by CEO Justin Dangel and CTO Alexander Whedon (formerly Head of GenAI at Meta), the startup raised $29 million in seed funding at a reported $500 million valuation.

The investor lineup reads like a who’s-who of startup royalty: Tinder co-founder Justin Mateen, ex-SoftBank Vision Fund partner Javier Villamizar, and early investors in Anthropic, OpenAI, Stripe, and Brex.

Funding announcement

The Core Innovation: Subquadratic Sparse Attention (SSA)

The headline feature is SSA (Subquadratic Sparse Attention), a sparse attention mechanism that fundamentally changes how the model scales with context length.

The Quadratic Problem

Standard Transformer attention scales as O(n²) — double the context, quadruple the compute. At 1 million tokens, attention alone would require trillions of operations. This is why most models cap context at 128K or 200K tokens.

The Subquadratic Solution

SSA replaces the dense attention matrix with a content-dependent sparse selection mechanism. Instead of comparing every token to every other token, the model dynamically selects which tokens to attend to, reducing complexity to near-linear O(n) scaling.

This is different from previous sparse attention approaches (like those used in Mamba, RWKV, or DeepSeek’s sparse attention) because:

Content-dependent selection — tokens are selected based on relevance, not fixed patterns
Fully subquadratic — the entire architecture, not just attention, is optimized for linear scaling
Trainable sparsity — the model learns which relationships matter during pretraining

The result: Subquadratic claims ~1,000x reduction in attention compute at 12M tokens compared to standard Transformer models.

Benchmark Performance

Subquadratic published results on three benchmarks:

Benchmark	SubQ Score	Comparison
SWE-Bench Verified	81.8%	Opus 4.6: 80.8%
RULER 128K (long-context retrieval)	95.0%	Opus 4.6: 94.8%
MRCR v2 (1M token retrieval)	65.9%	GPT-5.5: 74.0%, Gemini 3.1 Pro: 26.3%

At 1M tokens, SubQ dramatically outperforms Gemini 3.1 Pro (65.9% vs 26.3%) on MRCR v2, though it trails GPT-5.5 (74.0%). The RULER result is particularly impressive — 95% accuracy at 128K context, matching Claude Opus 4.6 at a claimed ~300x cost reduction ($8 vs ~$2,600).

Benchmark comparison

Products: Three Ways to SubQ

Subquadratic launched three products in private beta:

1. SubQ API

OpenAI-compatible API endpoints with a 1 million token production context window. Developers can swap in SubQ with minimal code changes.

2. SubQ Code

A CLI coding agent that loads entire codebases into context. Instead of RAG chunking, SubQ Code can ingest your whole repository and reason over it holistically.

3. SubQ Search

A free long-context research tool — think Perplexity with a million-token memory. Early testers report being able to upload entire books or technical documentation for analysis.

The Cost Argument

Perhaps the most striking claim is economic. Subquadratic’s cost-per-task analysis:

Task	SubQ	Claude Opus	Cost Ratio
RULER 128K	$8	~$2,600	~325x cheaper
SWE-Bench	~$0.50	~$5	~10x cheaper
MRCR v2 1M	~$50	~$15,000 (est.)	~300x cheaper

If these numbers hold, the implications are enormous: long-context tasks that were economically infeasible (analyzing entire codebases, processing full legal documents, reviewing complete academic papers) become routine.

The Skepticism: Why Researchers Are Demanding Proof

Not everyone is convinced. The AI research community has raised several concerns:

1. No Technical Paper

Subquadratic has not released a peer-reviewed paper or full technical report. The website says “paper coming soon” — a red flag for many researchers.

2. Closed Weights

The model is not open-sourced. Independent verification is impossible without access to weights or a reproducible specification.

3. Narrow Benchmarking

Only three benchmarks were published, all favoring long-context or coding tasks. No results on general reasoning (MMLU, GPQA), math (MATH, GSM8K), or multimodal benchmarks.

4. Research vs. Production Gap

The research configuration scores 83% on MRCR v2, but the production API scores 65.9% — a 17-point gap that raises questions about what’s being benchmarked.

5. Single-Run Results

Published results lack confidence intervals. In ML benchmarking, single runs can be misleading due to variance.

Previous subquadratic attempts (Mamba, RWKV, Hyena, S4) have shown promise at small scales but failed to match Transformer quality at full production scale. The community is waiting to see if SubQ breaks that pattern.

“Subquadratic’s claims are either the most important AI architecture breakthrough since ‘Attention Is All You Need’ — or a well-funded mirage. There’s no in-between.” — AI researcher quoted in VentureBeat coverage

What Is Real (and What’s Not)

Let’s separate confirmed facts from unverified claims:

Confirmed:

$29M seed funding at ~$500M valuation ✅
Team includes ex-Meta GenAI head Alexander Whedon ✅
Company emerged from stealth on May 5, 2026 ✅
API and products exist in private beta ✅

Unverified:

12M token context in production ❌ (research config only)
1,000x compute reduction ❌ (no independent audit)
Benchmark reproducibility ❌ (no paper, no weights)
Production reliability ❌ (private beta, limited testers)

Roadmap: What’s Next

Subquadratic has an aggressive roadmap:

Q3 2026: Expanded API access, SDK releases
Q4 2026: 50 million token context window target
2027: Enterprise post-training tools

The company has stated it has no plans to open-source SubQ’s weights, positioning instead as a commercial API provider.

Why This Matters

Even with the skepticism, SubQ represents a meaningful moment in AI development:

The quadratic barrier is the last major constraint on Transformer architectures. Whoever cracks subquadratic scaling unlocks fundamentally new use cases.
Long context changes everything. At 1M+ tokens, agents can work with entire codebases, legal cases, academic literature, or business documents in one shot — no RAG, no chunking, no lost context.
The economics force attention. Even if SubQ delivers 10% of what’s claimed, it would still be cheaper than existing approaches for long-context tasks.
Competitive pressure is healthy. Whether SubQ is real or not, the buzz pushes every lab to accelerate their own subquadratic research.

Conclusion

Subquadratic’s SubQ launch is one of the most consequential — and most controversial — AI announcements of 2026. If validated, SSA could fundamentally reshape AI economics, making million-token contexts affordable and ubiquitous. If not, it joins a long list of architectures that couldn’t scale.

Independent verification will come in the months ahead. Until then, SubQ is best approached with both genuine curiosity and healthy skepticism.

What’s certain: the race to subquadratic AI is now officially underway.