Blog 3

JSON browser

Browse raw evaluation records, open a formatted JSON viewer, or download the source files.

check_summary.json

baseline_agents_no_api / 89.2 KB

random_failure_misleading_web.json

baseline_agents_no_api / 1.2 KB

scripted_success_misleading_web.json

baseline_agents_no_api / 2.3 KB

open_source_google_gemma-4-26b-a4b-it-free_episodes1.json

baseline_blog_full / 29.7 KB

open_source_meta-llama_llama-3.3-70b-instruct-free_episodes1.json

baseline_blog_full / 32.4 KB

open_source_mistralai_mistral-small-3.2-24b-instruct_episodes1.json

baseline_blog_full / 57.0 KB

open_source_nvidia_nemotron-3-super-120b-a12b-free_episodes1.json

baseline_blog_full / 66.8 KB

open_source_openai_gpt-oss-20b-free_episodes1.json

baseline_blog_full / 37.7 KB

open_source_qwen_qwen3-next-80b-a3b-instruct-free_episodes1.json

baseline_blog_full / 32.5 KB

prompting_openai_gpt-5-mini_episodes1.json

baseline_blog_full / 39.1 KB

random_episodes5.json

baseline_blog_full / 140.0 KB

react_anthropic_claude-sonnet-4.6_episodes1.json

baseline_blog_full / 58.6 KB

react_openai_gpt-5-mini_episodes1.json

baseline_blog_full / 36.5 KB

scripted_episodes5.json

baseline_blog_full / 175.0 KB

frontier_openai_gpt-5.5_episodes1.json

blog_cheap_sweep / 29.7 KB

open_source_ibm-granite_granite-4.1-8b_episodes1.json

blog_cheap_sweep / 32.7 KB

prompting_openai_gpt-5-mini_episodes1.json

blog_cheap_sweep / 21.3 KB

react_openai_gpt-5-mini_episodes1.json

blog_cheap_sweep / 23.6 KB

scripted_episodes5.json

blog_cheap_sweep / 98.5 KB

random_episodes1.json

expanded_deterministic_smoke / 32.5 KB

scripted_episodes1.json

expanded_deterministic_smoke / 41.5 KB

summary.json

expanded_deterministic_smoke / 90.6 KB