GPT-OSS 20B on SRE-Zero Easy: ReAct Helps, Resolution Still Fails Often
A managed-run report comparing plain prompting, ReAct, and guided open-source-agent control for GPT-OSS 20B on the SRE-Zero easy split.
CS + AI/ML student
AI/ML and systems builder interested in reliable software, thoughtful evaluation, and practical research tools.
I work across LLM agents, reinforcement learning environments, evaluation systems, AI infrastructure, and applied engineering. This site collects my projects, writing, publications, and research notes as they develop.

Profile
I like problems where models, tools, data, and systems meet, especially when behavior needs to be measured carefully rather than only demoed.
I use this space as a working record of what I am building and learning: research prototypes, software projects, implementation notes, and longer-form writeups. The common thread is a preference for systems that can be inspected, tested, and improved over time.
Interests
A few areas I keep returning to while building projects and reading research.
Current research thread
An environment-grounded benchmark for evaluating reliable tool-using agents in simulated incident-response workflows. The project focuses on sequential decisions, safe tool use, partial evidence, remediation quality, and operational reliability metrics.
Writing
Research diary entries, project notes, and implementation writeups.
A managed-run report comparing plain prompting, ReAct, and guided open-source-agent control for GPT-OSS 20B on the SRE-Zero easy split.
A managed-run report comparing plain prompting, ReAct, and guided open-source-agent control for Mistral Small on the SRE-Zero easy split.
A managed-run report comparing plain prompting, ReAct, and guided open-source-agent control for Qwen on the SRE-Zero easy split.