Paradigm Shift

329 papers · Page 5 of 7

Filter by desk: AI Computing Robotics Math Quantum Physics Space Earth Chemistry Engineering Ecology Biology Neuroscience Health Psychology Economics Society

A geometric fix for Rotary Positional Embeddings (RoPE) allows Transformers to generalize to long inputs out-of-the-box by preserving 'sink token' functionality.

A synthesizable RTL implementation of Predictive Coding allows for fully distributed, non-backprop learning directly in hardware.

Dynamic constraints using an 'online refiner' resolve the conflict between stability and performance in Reinforcement Learning Fine-Tuning (RFT).

Uses Pearl's do-operator to automatically discover and mask irrelevant state dimensions in Reinforcement Learning.

Fine-tunes Vision-Language Models using raw images alone by using a text-to-image model as a cycle-consistency reward.

PowerFlow uses GFlowNets to replace heuristic rewards in unsupervised fine-tuning, allowing practitioners to explicitly tune models for either logic or creativity.

AS2 achieves a fully differentiable neuro-symbolic bridge by replacing discrete solvers with a soft, continuous approximation of the Answer Set Programming operator.

Standard decoding strategies (top-k, nucleus) create a 'truncation blind spot' by systematically excluding human-like, low-probability token choices.

SINDy-KANs combine Kolmogorov-Arnold Networks with Sparse Identification of Non-linear Dynamics to create parsimonious, interpretable models.

REST transforms the zero-shot object-navigation problem from simple waypoint selection to a tree-of-paths reasoning process.

A linear-time attention mechanism that is weight-compatible with standard pretrained Transformers, allowing for direct knowledge transfer.

A system where agents autonomously design, refine, and store task-specific skills as 'stateful prompts' to achieve non-parametric continual learning.

Shifts concept unlearning in diffusion models from fragile keyword-based removal to a distributional framework using contextually diverse prompts.

Eliminates the need for expensive process reward models by propagating terminal rewards across state-space graphs to generate dense, state-level rewards for agentic RL.

Introduces 'intentional interventions' and Structural Final Models (SFMs) to detect and infer agent goals within causal frameworks.

Uses Sparse Autoencoders (SAEs) to disentangle and modulate bias-relevant features in Vision-Language Models without retraining.

Incorporates the physics of forward dynamics directly into a GNN architecture for articulated robot control.

Argues that standard ML efficiency metrics (FLOPs, throughput) are poorly correlated with actual robot performance in Vision-Language-Action (VLA) models.

Reframes GPU kernel optimization by benchmarking against hardware 'Speed-of-Light' limits rather than software baselines.

Repurposes pre-trained video diffusion models as 'Latent World Simulators' to give Multimodal LLMs 3D spatial awareness without explicit 3D data.

Introduces Capability-Priced Micro-Markets (CPMM), a micro-economic framework for autonomous AI agent transactions over HTTP 402.

Proposes Modulated Hazard-aware Policy Optimization (MHPO) to solve the instability and mode collapse common in GRPO-based reinforcement learning.

Mathematically proves that the Transformer architecture is functionally equivalent to a Bayesian Network performing loopy belief propagation.

Achieves high-performance online continual learning without the massive memory overhead of traditional experience replay buffers.

A formal, graph-native memory architecture that treats agent memory as a versioned asset, dramatically outperforming Gemini 2.5 Pro on complex recall.

Shifts retrieval from static contrastive vector alignment to dynamic reasoning trajectories using a generative model (T1) and GRPO.

Provides a sheaf-theoretic proof that local causal consistency in generative models does not guarantee global counterfactual coherence.

Unifies large-scale search, recommendation, and reasoning into a single self-contained LLM by treating item IDs as a distinct modality.

Edit-As-Act reframes 3D scene editing as a goal-regressive planning problem using symbolic action languages rather than purely generative pixel manipulation.

A new self-refining surrogate framework enables neural models to simulate complex dynamical systems over arbitrarily long horizons without the standard failure mode of compounding error.

The 'consensus trap' in label-free RL—where models reinforce their own systematic errors—can be broken by co-evolving the model in alternating generator and verifier roles.

LLMs compute and cache confidence scores automatically during answer generation, well before they are prompted to verbalize them.

Measuring the distance between human languages can now be done quantitatively using the attention mechanisms of multilingual transformers.

AgentFactory shifts agent evolution from unreliable textual 'reflections' to a library of verifiable, executable Python subagents.

DAPS++ reinterprets diffusion inverse problems as a decoupled EM-style initialization, significantly increasing restoration speed and stability.

Alternating Reinforcement Learning with Rubric Rewards (ARL-RR) replaces brittle scalar reward aggregation with a semantic meta-class optimization framework.

Atlas introduces 'Compiled Memory,' which rewrites an agent's system prompt with distilled task experience rather than using RAG or fine-tuning.

Transition Flow Matching learns a global transition flow rather than local velocity fields, enabling single-step generation and transfer to arbitrary future time points.

Simulation Distillation (SimDist) enables rapid sim-to-real adaptation by transferring reward and value models directly into a latent world model.

Introduces a privacy-preserving ML framework that achieves strong non-invertibility without the utility loss of Differential Privacy or the cost of Homomorphic Encryption.

Analyses over 10,000 experiments to prove that LLM agents are capable of genuine architectural discovery rather than just hyperparameter tuning.

Introduces per-token adapter routing, allowing a single sequence to dynamically utilize multiple specialized LoRA experts.

Finds that filtering knowledge at 'write-time' (ingestion) maintains 100% RAG accuracy under noise levels where standard 'read-time' filtering completely collapses.

Proposes a protocol that replaces complex multi-agent coding frameworks with a simple, interpretable filesystem structure.

Establishes a duality between sequence-axis attention and depth-wise residual connections, treating layer depth as an ordered variable.

Proves that compositional generalization failure in neural networks is an architectural issue and provides a category-theoretic framework to fix it.

Formulates Hierarchical Instruction Following as a Constrained Markov Decision Process to ensure LLMs prioritize system prompts over user instructions.

Introduces modular, composable safety alignment via learnable control tokens rather than static parameter-level tuning.

Decouples perceptual failures from logical errors in Vision-Language reward models to enable more reliable test-time scaling.

Researchers identified a 'critique vector' in the latent space of Large Reasoning Models that can be steered to improve self-correction and test-time scaling.