Research

Reliable agents, grounded environments, and AI systems

My long-term direction is to understand how tool-using AI agents can be evaluated and trained in environments that make reliability measurable.

Research focus

I am most interested in agent behavior that emerges across a sequence of observations and actions: evidence gathering, diagnosis, tool selection, intervention, and recovery.

Reliable tool-using LLM agents
Reinforcement learning environments
Agent evaluation and failure analysis
Post-training methods such as SFT, DPO, GRPO, and RLHF-style environment feedback
AI systems and infrastructure for training/evaluation

Current project

SRE-Zero

SRE-Zero is a long-term research project on RL environments and evaluation systems for simulated site reliability engineering incident response. It is designed to evaluate how agents inspect evidence, choose actions, and recover from mistakes under a limited step budget.

Environment-groundedIncident responseSequential decisions
Open project page

Future directions

Benchmarks that expose intermediate agent decisions, not only final answers.
Training signals from structured environments and operational feedback.
Failure taxonomies for tool use, remediation, recovery, and evidence gathering.
Lightweight infrastructure for reproducible agent evaluation at research scale.

Reading and research notes

A curated notes section will be added as the reading list and project logs become more structured. For now, research diary posts live on the blog.

View blog