AI & ML New Capability

A self-improvement framework (MIPO) that improves LLM personalization and reasoning with zero additional data or human labels.

March 23, 2026

Original Paper

Maximizing mutual information between user-contexts and responses improve LLM personalization with no additional data

Hyunji Nam, Haoran Li, Natasha Jaques

arXiv · 2603.19294

The Takeaway

It uses contrastive data augmentation based on the user's prompt to maximize mutual information. The ability to improve model performance by 3-40% on personalization tasks without expensive new datasets represents a major shift in how we approach post-training.

From the abstract

While post-training has successfully improved large language models (LLMs) across a variety of domains, these gains heavily rely on human-labeled data or external verifiers. Existing data has already been exploited, and new high-quality data is expensive to collect. More fundamentally, true intelligence goes far beyond tasks that are easily verifiable. Therefore, we need self-improvement frameworks that allow models to improve without external oversight. We propose *Mutual Information Preference