A startup claims it broke through a bottleneck that’s holding back LLMs
TL;DR
Miami-based Subquadratic came out of stealth in May with a big claim: it says it solved a math bottleneck that has made long-context LLMs expensive since the Transformer era. The likely target is attention’s quadratic scaling, where compute and memory pressure rise sharply as context length grows. The first reveal was thin and many experts were unconvinced. Subquadratic is now sharing more technical evidence, but independent replication is the real test.
Nauti's Take
The right reaction is sober curiosity. A real subquadratic attention breakthrough would matter because it hits a bottleneck users can feel: memory, latency and cost.
That also raises the burden of proof. Startups love selling mathematical magic, but developers need runnable code, reproducible benchmarks and clear limits on which models and context lengths actually benefit.
Briefingshow
Long context is one of the most expensive parts of modern LLM products, especially for agents, legal documents, codebases and knowledge systems. A real attention breakthrough would change product design, not just benchmarks: more context per request, fewer chunking workarounds and lower inference costs. Until the evidence is independently reproduced, the practical impact remains open.