Anthropic's newest flagship delivers unprecedented multi-step reasoning and native computer-use capabilities that outperform prior models by wide margins.
A landmark paper from Stanford and FAIR shows agents that iteratively refine their own strategies via online experience close the gap to frontier models faster.
The popular multi-agent toolkit ships major updates focused on monitoring, budget enforcement, and debugging live swarms.
A large-scale evaluation across 12 real-world domains finds orchestrated agent teams beating single frontier models on 9 of 12 tasks.
Anthropic's newest flagship delivers unprecedented multi-step reasoning and native computer-use capabilities that outperform prior models by wide margins.
The new Llama release targets the enterprise with enormous context windows and first-class support for calling hundreds of tools in parallel.
Developers can now build agents that control the mouse, keyboard, and browser with the same reliability as the research previews.
The latest reasoning model sets new records across GPQA, SWE-bench, and mathematical olympiads while slashing latency.
The community-driven agent runtime reaches major milestone with production-grade reliability, observability, and multi-model routing.
A new family of agents trained end-to-end for real-world physical tasks demonstrates breakthrough generalization across robot morphologies.