AI & ML New Capability

ORACLE uses symbolic reasoning engines to verify intermediate reasoning steps in synthetic data generation, moving beyond simple answer-correctness filtering.

March 24, 2026

Original Paper

ORACLE: Optimizing Reasoning Abilities of Large Language Models via Constraint-Led Synthetic Data Elicitation

Zhuojie Yang, Wentao Wan, Keze Wang

arXiv · 2603.21140

The Takeaway

It enables the creation of high-quality reasoning datasets for natural language tasks where code execution is impossible. By validating each step of a syllogistic chain, it provides a more reliable signal for fine-tuning LLM reasoning capabilities.

From the abstract

Training large language models (LLMs) with synthetic reasoning data has become a popular approach to enhancing their reasoning capabilities, while a key factor influencing the effectiveness of this paradigm is the quality of the generated multi-step reasoning data. To generate high-quality reasoning data, many recent methods generate synthetic reasoning paths and filter them based on final answer correctness, often overlooking flaws in intermediate reasoning steps. To enhance the verification of

Read the original paper →

← Back to today's papers