AI & ML New Capability

Architects an autonomous AI research agent that significantly surpasses previous benchmarks by utilizing asynchronous multi-GPU scaling and a hidden consistent evaluation protocol.

March 30, 2026

Original Paper

AIRA_2: Overcoming Bottlenecks in AI Research Agents

Karen Hambardzumyan, Nicolas Baldwin, Edan Toledo, Rishi Hazra, Michael Kuchnik, Bassel Al Omari, Thomas Simon Foster, Anton Protopopov, Jean-Christophe Gagnon-Audet, Ishita Mediratta, Kelvin Niu, Michael Shvartsman, Alisia Lupidi, Alexis Audran-Reiss, Parth Pathak, Tatiana Shavrina, Despoina Magka, Hela Momand, Derek Dunfield, Nicola Cancedda, Pontus Stenetorp, Carole-Jean Wu, Jakob Nicolaus Foerster, Yoram Bachrach, Martin Josifoski

arXiv · 2603.26499

The Takeaway

It identifies and solves three structural bottlenecks in AI agents (sync execution, validation noise, and fixed operators). Reaching a 76th percentile on MLE-bench-30 marks a significant step toward agents that can truly automate machine learning experimentation.

From the abstract

Existing research has identified three structural performance bottlenecks in AI research agents: (1) synchronous single-GPU execution constrains sample throughput, limiting the benefit of search; (2) a generalization gap where validation-based selection causes performance to degrade over extended search horizons; and (3) the limited capability of fixed, single-turn LLM operators imposes a ceiling on search performance. We introduce AIRA$_2$, which addresses these bottlenecks through three archit