A modified 110M parameter ColBERT model can identify fine-grained evidence spans as accurately as a 27B parameter LLM, but at a fraction of the cost.
April 2, 2026
Original Paper
FGR-ColBERT: Identifying Fine-Grained Relevance Tokens During Retrieval
arXiv · 2604.00242
The Takeaway
It demonstrates that token-level relevance signaling can be distilled directly into the retrieval model itself. This removes the need for expensive LLM 'rerank-and-explain' steps in RAG pipelines, making high-precision evidence highlighting viable for production scale.
From the abstract
Document retrieval identifies relevant documents but does not provide fine-grained evidence cues, such as specific relevant spans. A possible solution is to apply an LLM after retrieval; however, this introduces significant computational overhead and limits practical deployment. We propose FGR-ColBERT, a modification of ColBERT retrieval model that integrates fine-grained relevance signals distilled from an LLM directly into the retrieval function. Experiments on MS MARCO show that FGR-ColBERT (