Reduces Text-to-SQL input tokens by 99% by internalizing the database schema into the model weights through a two-phase fine-tuning approach.
March 26, 2026
Original Paper
Schema on the Inside: A Two-Phase Fine-Tuning Method for High-Efficiency Text-to-SQL at Scale
arXiv · 2603.24023
The Takeaway
By eliminating the need to include massive schema definitions in every prompt, this method enables 8B-parameter models to outperform proprietary models like Gemini Flash 2.0 while drastically reducing API costs and latency. It provides a viable path for deploying high-precision SQL agents in production environments with massive schemas.
From the abstract
Applying large, proprietary API-based language models to text-to-SQL tasks poses a significant industry challenge: reliance on massive, schema-heavy prompts results in prohibitive per-token API costs and high latency, hindering scalable production deployment. We present a specialized, self-hosted 8B-parameter model designed for a conversational bot in CriQ, a sister app to Dream11, India's largest fantasy sports platform with over 250 million users, that answers user queries about cricket statis