Independently reproduces OpenAI's gpt-oss-20b scores by reverse-engineering undisclosed tool-calling formats and agent harnesses.
April 2, 2026
Original Paper
In harmony with gpt-oss
arXiv · 2604.00362
The Takeaway
Democratizes the methodology behind frontier tool-calling models by showing that tool-calling priors exist in training distributions and providing a native 'harmony' harness to bypass lossy Chat Completion conversions.
From the abstract
No one has independently reproduced OpenAI's published scores for gpt-oss-20b with tools, because the original paper discloses neither the tools nor the agent harness. We reverse-engineered the model's in-distribution tools: when prompted without tool definitions, gpt-oss still calls tools from its training distribution with high statistical confidence -- a strong prior, not a hallucination. We then built a native harmony agent harness (this https URL) that encodes messages in the model's native