AI & ML Paradigm Shift

Provides a theoretical framework for why training AI on what to avoid (negative constraints) is structurally superior and more stable than training on preferences.

March 18, 2026

Original Paper

Via Negativa for AI Alignment: Why Negative Constraints Are Structurally Superior to Positive Preferences

Quan Cheng

arXiv · 2603.16417

The Takeaway

It challenges the current RLHF dogma of 'learning what humans prefer' by arguing that preferences are inexhaustible and context-dependent (leading to sycophancy), while constraints are finite and verifiable. This provides a formal basis for the recent success of negative-only feedback methods like DPO.

From the abstract

Recent empirical results have demonstrated that training large language models (LLMs) with negative-only feedback can match or exceed standard reinforcement learning from human feedback (RLHF). Negative Sample Reinforcement achieves parity with PPO on mathematical reasoning; Distributional Dispreference Optimization trains effectively using only dispreferred samples; and Constitutional AI outperforms pure RLHF on harmlessness benchmarks. Yet no unified theoretical account explains why negative s

Read the original paper →

← Back to today's papers