Blog

Research notes

A running log of research ideas, implementation notes, and project documentation.

6 min read

Qwen on SRE-Zero Easy: Agent Control Matters

A managed-run report comparing plain prompting, ReAct, and guided open-source-agent control for Qwen on the SRE-Zero easy split.

SRE-ZeroLLM AgentsEvaluationBenchmarking
6 min read

Benchmarking Agents Is Also a Systems Problem

Why I paused SRE-Zero's 40-task open-weight sweep and added retries, cooldowns, checkpoints, pause, and resume before publishing larger model results.

SRE-ZeroLLM AgentsEvaluationAI Systems