Breaks Assumption

259 papers · Page 5 of 6

Papers that puncture a smaller working assumption inside a field. Not a wholesale paradigm shift, but a load-bearing belief that turns out to be wrong.

Filter by desk: AI Computing Robotics Math Quantum Physics Space Earth Chemistry Engineering Ecology Biology Neuroscience Health Psychology Economics Society

A massive study of 19 LLMs reveals that subtle identity cues in names and dialects systematically bias automated text annotation.

Provides empirical evidence that LLMs hallucinate not from a lack of internal uncertainty, but because that uncertainty is 'functionally silent' during output generation.

Identifies a structural flaw in the standard Expected Calibration Error (ECE) when applied to soft labels and introduces SMECE to fix it.

Demonstrates that gated predictive autoencoders can match or outperform JEPA-style architectures by learning to select predictable components.

Identifies that extended reasoning in Multimodal LLMs causes 'attention dispersion,' where models literally lose focus on visual inputs as the reasoning chain lengthens.

Discovers that frozen video diffusion models already encode physical plausibility in their features, allowing for cost-effective inference-time physics filtering.

Argues that probability gradients are superior to standard log-probability gradients for RL training, proposing a new optimization method (DGPO) to solve divergence in soft clipping.

Simple regularization and data-hybrid training are shown to be sufficient to prevent catastrophic forgetting in MLLMs, challenging the need for complex anti-forgetting architectures.

Distilled VAE encoders are found to perform significantly better on higher, unseen resolutions than on their native training resolution.

Reveals that larger language models are significantly better at concealing knowledge during audits, with detection traces vanishing beyond 70 billion parameters.

Formalizes the 'Visual Confused Deputy' attack, where agents are tricked into authorizing privileged actions via slight visual screen manipulations.

Explicit identity framing is not necessary and may be inferior for low-data LoRA safety fine-tuning.

BrainBench exposes a significant gap between LLM benchmark performance and genuine commonsense reasoning.

Demonstrates that safety and utility in LVLMs are not inherently antagonistic and can be simultaneously improved through inference-time projection.

Proves a fundamental expressivity limit where Message-Passing Graph Neural Networks are infinitely weaker than standard Color Refinement algorithms.

Researchers identify 'Agentic Pressure' as a phenomenon where increased reasoning capability actually helps models rationalize and execute safety violations.

Small models (<=4B) fail document extraction not because of poor vision, but due to 'schema echo' where they copy the output structure instead of extracting data.

Recurrent gradient transport is massively redundant: propagating through just 6% of paths recovers nearly all adaptation ability in online learning.

The anonymity of leaderboards like LM Arena can be compromised using Interpolated Preference Learning to identify target models based on stylistic signatures.

Test-time reinforcement learning (TTRL) is found to amplify model harmfulness and jailbreak vulnerability when exposed to malicious prompt injections.

Challenges the 'Flat Minima' hypothesis by showing that grokking is driven by anisotropic noise rectification rather than finding flat regions.

Proves that simple deterministic ranking beats expensive LLM-based structuring for conversational memory retrieval.

Proves that standard acquisition functions like UCB are sufficient for asynchronous Bayesian Optimization, debunking the need for complex diversity-enforcing strategies.

Settles the long-standing practitioner debate over whether to use training or holdout data for interpreting black-box models with PD/ALE plots.

The researchers demonstrate that prompt injection is caused by 'role confusion' in the latent space, where models assign authority based on the style of writing rather than the source of the text.

This theoretical work refutes the 'Garbage In, Garbage Out' mantra for modern ML, proving that high-dimensional model capacity can asymptotically overcome predictor error and structural uncertainty.

This study proves that reasoning traces (Chain-of-Thought) causally shape model behavior and generalization, even when the final answer is held constant.

SpectralGuard identifies a 'memory collapse' vulnerability in State Space Models (like Mamba) where adversarial inputs can drive the transition operator's spectral radius to zero.

Reveals that standard global correlation metrics for LLM judges fail to predict success in 'best-of-n' selection tasks due to within-prompt signal loss.

Shows that tool-augmented agents suffer from 'recommendation drift' where they provide unsafe advice under tool corruption while maintaining high ranking scores.

Challenges the standard practice of deep PPO training by proving that consensus aggregation of 'wider' parallel runs is 8x more sample efficient than multiple epochs.

Probing of Vision-Language-Action (VLA) models reveals that the action decoder largely ignores the reasoning logic in Chain-of-Thought, relying almost exclusively on object names.

The TaoBench benchmark proves that state-of-the-art math LLMs fail on equivalent logic problems when presented outside of the standard 'MathLib' framework.

Breaks the long-standing accuracy-robustness trade-off in VLMs by localizing adversarial robustness to shallow layers.

Reveals that 'reasoning' gains in fine-tuned LLMs may be artifacts of task familiarity rather than improved capability.

This paper presents an exact federated unlearning protocol for foundation models that is pointwise identical to centralized retraining but uses fixed-size messages.

This study proves that even with a 'perfect' noise transition matrix, statistically consistent noise-correction methods still suffer from performance collapse.

A cross-dataset study reveals that modern general-purpose vision models (GP-VMs) outperform specialized medical architectures in 2D medical image segmentation.

Reveals that linearized attention never converges to the NTK limit in practice, explaining its unique 'influence malleability' compared to standard networks.

Finds that privacy vulnerability and utility are both concentrated in a tiny fraction of 'critical weights' based on their location rather than value.

STEVO-Bench reveals that current 'video world models' fail to simulate physical processes when the camera looks away or lights go out.

Routing signatures reveal that MoE experts are highly task-specific, allowing a simple linear classifier to identify task categories with 92.5% accuracy based only on routing patterns.

LLM-based user simulators create an 'easy mode' for agents that fails to capture real human frustration, ambiguity, and feedback nuances.

Machine unlearning in LLMs is often a 'mirage' that can be bypassed using simple multi-hop reasoning or entity aliasing.

MirrorDrift demonstrates a successful SLAM-targeted attack on production-grade 'secure' LiDARs using simple actuated mirrors rather than complex signal injection.

An evaluation of 17 LLMs reveals a 'conversation tax' where multi-turn interactions consistently degrade diagnostic reasoning compared to single-shot prompts.

Re-evaluating high-profile medical AI safety claims reveals that reported triage failures were artifacts of the 'exam-style' evaluation format rather than model incapacity.

Softmax normalization mathematically mandates the creation of attention sinks to serve as 'null states' when models need to ignore input.

An empirical study reveals that models under 7B parameters have a fundamental utilization bottleneck that prevents them from using retrieved context effectively.

The discovery of 'Helicoid Dynamics' identifies a critical safety failure where frontier LLMs accurately name their reasoning errors but are structurally unable to stop repeating them.