Publications

Papers and preprints

Current entries are in preparation and should not be read as accepted publications.

SRE-Zero: Environment-Grounded Evaluation of Reliable Tool-Using LLM Agents

In preparation

Benchmark paper draft planned around the initial environment, baselines, and failure analysis.