The AI industry has a dirty little secret: the old playbook of making models bigger and feeding them more data is running out of road. Pre-training scaling, the engine that powered everything from GPT-3 to the current generation of frontier models, is bumping up against data shortages and diminishing returns. ByteDance thinks it found a new gear.
The company’s Seed AI research team published a paper introducing what it calls a scaling law for post-deployment learning, the idea that AI agents get predictably better the longer they interact with real-world environments after they’ve already been trained and released.
What ByteDance actually found
The research team built a new benchmark called EdgeBench, consisting of 134 long-horizon tasks. Each task requires a minimum of 12 hours of continuous operation. These aren’t quick chat completions or image classifications. They’re extended, complex workflows that demand sustained reasoning and adaptation over time.
To test the concept, the team analyzed over 38,000 hours of interactions between AI agents and their environments. The models put through their paces included some of the most capable systems available: Anthropic’s Claude Opus 4.8, OpenAI’s GPT 5.5 and GPT 5.4, along with models from Zhipu AI and DeepSeek.
The headline finding: agents doubled their learning speed every three months of real-world interaction. That’s not a vague directional improvement. It’s a quantifiable, repeatable pattern, which is exactly what makes it a scaling law rather than just a nice anecdote.
Why the old scaling playbook is breaking down
For years, the AI industry operated on a simple formula. Want a better model? Make it bigger, train it on more data, throw more compute at it. But the cracks have been showing for a while. Epoch AI forecasts a looming shortage of high-quality, publicly available human-generated text data within the next six years.
OpenAI co-founder Andrej Karpathy has been among the most prominent voices flagging the problem. The brute-force approach of scaling model sizes and training datasets is becoming increasingly impractical. The compute costs are astronomical. And each incremental improvement requires disproportionately more resources, a classic case of diminishing returns.
This is the wall that ByteDance’s research attempts to climb over. Rather than pouring resources exclusively into making models bigger before deployment, the Seed AI team argues that what happens after deployment deserves equal systematic attention and structured investment.
What this means for investors and the broader AI market
If this scaling law holds up under broader scrutiny, it reshapes the economics of the entire AI industry. The current investment thesis for most AI companies revolves around pre-training: whoever can afford the most GPUs and license the most data wins. Post-deployment scaling changes the calculus. If agents genuinely improve at a predictable, accelerating rate through real-world use, then distribution and deployment infrastructure suddenly matter as much as raw training compute.
The risk, of course, is that the scaling law doesn’t generalize beyond the specific benchmarks and models tested. Thirty-eight thousand hours of agent interactions is substantial, but the benchmark itself, EdgeBench, is new and hasn’t been independently validated by other research groups yet.
There’s also a competitive dynamic worth tracking. The research tested models from Anthropic, OpenAI, DeepSeek, and Zhipu AI, meaning ByteDance has effectively benchmarked its competitors’ models against a framework it designed.
Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

1 hour ago
23









English (US) ·