New Capability New Capability
333 papers · Page 3 of 4
A unified reinforcement learning framework that jointly optimizes reasoning (text) and synthesis (image) for interleaved multimodal generation.
AI & ML arxiv | Mar 25
Develops a differentially private RLHF pipeline that decouples private reward learning from policy optimization, achieving strong alignment on Gemma-2B-IT with privacy guarantees.
AI & ML arxiv | Mar 25
Inference-time 'steering' of Code LLMs allows for precise control over programming languages and libraries without prompting or fine-tuning.
AI & ML arxiv | Mar 26
A universal 'one-shot' medical anomaly detector that outperforms specialized models across nine different datasets.
AI & ML arxiv | Mar 26
Sparse Autoencoders (SAEs) can successfully decompose opaque medical vision foundation model embeddings into human-interpretable clinical concepts.
AI & ML arxiv | Mar 26
Symbolic-KANs bridge the gap between scalable deep learning and interpretable symbolic regression by embedding discrete library primitives directly into the network.
AI & ML arxiv | Mar 26
An 'invariant compiler' uses LLMs to translate physics requirements into Neural ODE architectures that satisfy conservation laws by construction.
AI & ML arxiv | Mar 26
POISE demonstrates the first autonomous, evidence-driven discovery of improved policy optimization algorithms for LLMs.
AI & ML arxiv | Mar 26
SDZE enables the training of 10-million-dimensional Physics-Informed Neural Networks (PINNs) on a single GPU.
AI & ML arxiv | Mar 26
Solves the 'vanishing gradient' problem in 3D Gaussian Splatting (3DGS) tracking by optimizing in the frequency domain using spectral moments.
AI & ML arxiv | Mar 26
Restores editable, semantically layered structures from flattened vector graphics (SVGs/icons) by using generative completion to recover occluded geometries.
AI & ML arxiv | Mar 26
Identifies that 'attention imbalance' across modalities and tokens drives object hallucinations and proposes a decoding-time rectification (AIR) to fix it.
AI & ML arxiv | Mar 26
SOMA provides a plug-and-play memory and orchestration system that increases Vision-Language-Action (VLA) robot success rates by over 50% without fine-tuning.
AI & ML arxiv | Mar 26
Breaks the resolution and aspect ratio barriers of image diffusion models, enabling the generation of consistent 32K resolution images.
AI & ML arxiv | Mar 26
Applies reinforcement learning with a cycle-consistency reward to drastically improve natural language to Lean4 autoformalization.
AI & ML arxiv | Mar 26
Reformulates molecular discovery as an autonomous MCTS planning problem over executable chemical operations rather than just similarity-based prediction.
AI & ML arxiv | Mar 26
An autonomous agentic pipeline discovered novel white-box adversarial attacks that outperform existing methods by up to 300%.
AI & ML arxiv | Mar 26
UI-Voyager achieves an 81.0% success rate on AndroidWorld, exceeding human-level performance in mobile GUI automation.
AI & ML arxiv | Mar 26
Wasserstein Parallel Transport provides a formal framework for counterfactual prediction in evolving probability distributions.
AI & ML arxiv | Mar 26
Moves medical AI from simplified 2D image classification to agents navigating full 3D clinical studies.
AI & ML arxiv | Mar 27
Enables semantically precise model editing directly in the weight space without any training data.
AI & ML arxiv | Mar 27
Estimates lab-grade 3D musculoskeletal forces from a single smartphone video.
AI & ML arxiv | Mar 27
Quantifies near-verbatim data extraction risk in LLMs at 1/5000th the computational cost of standard Monte Carlo methods.
AI & ML arxiv | Mar 27
Enables graph-based retrieval and reranking for RAG without the maintenance overhead of a knowledge graph.
AI & ML arxiv | Mar 27
GeoNDC introduces a queryable neural data cube that compresses 20 years of planetary satellite data by 95x while allowing on-demand continuous-time reconstruction.
AI & ML arxiv | Mar 27
Intern-S1-Pro is the first trillion-parameter scientific multimodal foundation model, outperforming proprietary models on specialized scientific reasoning.
AI & ML arxiv | Mar 27
AirVLA successfully transfers manipulation-trained Vision-Language-Action (VLA) models to underactuated aerial robots using a payload-aware guidance mechanism.
AI & ML arxiv | Mar 27
Z-Erase introduces the first concept erasure method for single-stream diffusion transformers, preventing generation collapse in new unified architectures.
AI & ML arxiv | Mar 27
SEVerA enables the synthesis of self-evolving agents with formal guarantees by combining LLM planning with first-order logic rejection samplers.
AI & ML arxiv | Mar 27
Trace2Skill distills lessons from across a 'parallel fleet' of execution trajectories into a unified, conflict-free skill directory for LLM agents.
AI & ML arxiv | Mar 27
Enable long video generation from short-video diffusion models without any additional training or fine-tuning.
AI & ML arxiv | Mar 27
Training-free 6D pose estimation for unseen surgical instruments using only a CAD model as prior knowledge.
AI & ML arxiv | Mar 27
Offline Decision Transformers can now synthesize strategies that surpass the classical heuristics they were trained on for the Traveling Salesman Problem.
AI & ML arxiv | Mar 27
A foundation model for gait transforms 3D skeletal motion into a systemic biosignal for multi-system health monitoring.
AI & ML arxiv | Mar 27
LLMs can be fine-tuned to act as their own 'Z-token' compressors, achieving 18x text reduction without losing reconstruction fidelity.
AI & ML arxiv | Mar 27
Defines 'Reasoning Safety' as a new security dimension and introduces a real-time monitor to detect logic-chain hijackings.
AI & ML arxiv | Mar 27
Introduces a training-free pipeline for pixel-level video anomaly detection that achieves a 5x improvement in object-level accuracy.
AI & ML arxiv | Mar 27
A model-agnostic framework to extract the model-implied causal structure from any trained temporal predictor.
AI & ML arxiv | Mar 27
Detects when object detectors fail to see safety-critical objects by measuring semantic misalignment with foundation model embeddings.
AI & ML arxiv | Mar 27
Translates a single natural language sentence into a validated, hardware-specific computational imaging system design.
AI & ML arxiv | Mar 27
A training-free decoding framework that mitigates multimodal hallucinations by re-ranking tokens based on spatial attention entropy.
AI & ML arxiv | Mar 27
Introduces a 'Hybrid Memory' architecture that maintains the identity and motion of dynamic subjects even when they hide out of view.
AI & ML arxiv | Mar 27
A decentralized system that automates ML research and trains domain-expert 1.58-bit ternary models for CPU-native inference.
AI & ML arxiv | Mar 30
Modulates LLM hidden states with eye-gaze data to outperform GPT-4o by 10.5 points on streaming video understanding.
AI & ML arxiv | Mar 30
Fixes physically impossible video generation by disentangling semantic prompts from physical dynamics during training.
AI & ML arxiv | Mar 30
Integrates radiologist gaze data as a probabilistic prior to align vision-language models with actual human clinical reasoning workflows.
AI & ML arxiv | Mar 30
Introduces ReinPatch, the first framework to jointly optimize sequence tokenization and backbone models using reinforcement learning.
AI & ML arxiv | Mar 30
Moves coding agents from passive execution to proactive collaboration by teaching them when to ask for clarification on underspecified tasks.
AI & ML arxiv | Mar 30
Provides mechanistic evidence that LLMs internalize 'vibes' (informal registers like slang) as language-agnostic abstractions that can be causally steered.
AI & ML arxiv | Mar 30
Enables GUI agents to overcome domain bias by autonomously 'watching' web tutorial videos to learn specific software workflows without retraining.
AI & ML arxiv | Mar 30
Introduces a label-free, output-agnostic method for merging LoRA modules across heterogeneous tasks like classification and regression.
AI & ML arxiv | Mar 30
Enables verification of claimed text-to-image models through boundary-aware prompts that trigger model-specific instability.
AI & ML arxiv | Mar 30
Boosts multimodal reasoning by teaching models to autonomously verify their own long-form generations against image evidence using information gain.
AI & ML arxiv | Mar 30
Enables high-quality, spatio-temporally consistent 4D reconstruction using sparse, uncalibrated camera inputs instead of expensive synchronized arrays.
AI & ML arxiv | Mar 30
Architects an autonomous AI research agent that significantly surpasses previous benchmarks by utilizing asynchronous multi-GPU scaling and a hidden consistent evaluation protocol.
AI & ML arxiv | Mar 30
A model-agnostic framework that uses synthetic sampling to provide statistically valid uncertainty quantification and hallucination detection for multimodal models.
AI & ML arxiv | Mar 30
Shifts multimodal LLMs from static image prefixes to an active, sequential 'Visual Chain-of-Thought' that explores images based on saliency.
AI & ML arxiv | Mar 31
The first training-free framework for high-fidelity appearance transfer specifically designed for Diffusion Transformers (DiTs).
AI & ML arxiv | Mar 31
LLMs used for financial forecasting are often 'cheating' by memorizing training data, a bias this framework detects and filters out to improve Sharpe ratios by 49%.
AI & ML arxiv | Mar 31
A unified L0-gating mechanism that enables comparable sparsification and pruning across graphs, text, and tabular data.
AI & ML arxiv | Mar 31
Enables vision models to learn online from human corrections at inference time, reducing redundant manual effort in video segmentation by up to 34%.
AI & ML arxiv | Mar 31
Enables zero-shot monocular metric depth estimation across any camera type (fisheye, 360, ERP) using a single unified model.
AI & ML arxiv | Mar 31
Reframes LLM-assisted research as a scientific forecasting problem, training models to generate proposals that align with future (held-out) research directions.
AI & ML arxiv | Mar 31
Enables precise, physically plausible control over light position, color, and intensity in single images without a 3D model.
AI & ML arxiv | Mar 31
IP-SAM allows the Segment Anything Model (SAM) to perform automatic, prompt-free segmentation by generating its own 'intrinsic prompts'.
AI & ML arxiv | Mar 31
Moves autonomous driving from 'predict-then-plan' to an interleaved VLA model where future frames and ego-actions are generated step-by-step.
AI & ML arxiv | Mar 31
A non-Turing-complete DSL that compiles high-level LLM routing and agent policies directly into verified infrastructure artifacts like Kubernetes NetworkPolicies.
AI & ML arxiv | Mar 31
A production-grade framework that converts LLM/RAG evaluation into a deployment decision workflow using Pareto frontiers and CI gates.
AI & ML arxiv | Mar 31
Enables Active Learning for tabular data without model retraining by iteratively optimizing the 'labeled context' of foundation models.
AI & ML arxiv | Mar 31
Lie Generator Networks enable linear system identification with guaranteed physical stability and dissipation by construction rather than through loss penalties.
AI & ML arxiv | Mar 31
Achieves high-quality 3D reconstruction and camera pose estimation from sparse views without any pre-trained priors or ground-truth annotations.
AI & ML arxiv | Mar 31
Introduces 'Hidden Ads,' a new class of semantic backdoor attacks that inject promotional content into VLM responses based on natural user behavior.
AI & ML arxiv | Mar 31
Achieves zero-shot, prompt-free object removal in diffusion models purely through self-attention manipulation.
AI & ML arxiv | Mar 31
VoxAnchor uses mmWave radar to authenticate speech by matching acoustics to physical throat vibrations.
AI & ML arxiv | Mar 31
RAGent enables training-free, deployment-time human activity recognition for mmWave radar using agentic reasoning.
AI & ML arxiv | Mar 31
Bridges the gap between free-form natural language and safety-critical UAV navigation using Signal Temporal Logic (STL) translation and repair.
AI & ML arxiv | Mar 31
TianJi is the first 'AI meteorologist' system capable of autonomously driving complex numerical models to verify physical hypotheses in atmospheric science.
AI & ML arxiv | Mar 31
Heracles uses a state-conditioned diffusion middleware to bridge precise motion tracking with generative recovery for humanoid robots.
AI & ML arxiv | Mar 31
Sortify is the first fully autonomous LLM agent deployed in production for closed-loop recommendation ranking optimization.
AI & ML arxiv | Mar 31
AutoStan demonstrates a CLI coding agent that autonomously builds and iteratively improves interpretable Bayesian models in Stan.
AI & ML arxiv | Mar 31
Introduces SCOUT, a routing framework that intelligently selects which Image-to-3D reconstruction model to use based on input difficulty and cost constraints.
AI & ML arxiv | Mar 31
GraySense enables geospatial object tracking using only encrypted network packet sizes without any access to raw video streams.
AI & ML arxiv | Mar 31
Wan-R1 successfully applies Group Relative Policy Optimization (GRPO) to flow-based video models to enable verifiable spatial reasoning.
AI & ML arxiv | Mar 31
Poppy provides a training-free way to refine monocular surface normals using single-shot polarization measurements at test time.
AI & ML arxiv | Mar 31
ATLAS-RTC introduces token-level runtime control that detects and corrects LLM drift from structured output contracts during the forward pass.
AI & ML arxiv | Mar 31
Guardrails successfully implements and flight-tests Control Barrier Functions on an F-16 fighter jet to enforce safety limits in real-time.
AI & ML arxiv | Mar 31
Iterative Motion Imitation enables bicycle robots to perform unassisted front-flips by learning from initially 'impossible' reference motions.
AI & ML arxiv | Mar 31
Proteina-Complexa unifies generative flow-based modeling with structure-based 'hallucination' to set a new SOTA in atomistic protein binder design.
AI & ML arxiv | Mar 31
The first framework for bit-identical deep learning training that produces MD5-verified identical weights across independent runs.
AI & ML arxiv | Mar 31
Meta-Harness automates the engineering of the 'code' surrounding LLMs, improving RAG and agent performance by optimizing retrieval and context management logic.
AI & ML arxiv | Mar 31
A training-free metacognitive framework that gives LLMs explicit control over expanding, pruning, and repairing reasoning trajectories during inference.
AI & ML arxiv | Mar 31
Presents PReD, the first foundation model and 1.3M-sample dataset specifically for electromagnetic signal perception and decision-making.
AI & ML arxiv | Mar 31
Transitions reasoning model optimization from coarse sequence-level advantages to fine-grained token dynamics.
AI & ML arxiv | Mar 31
Enhances Kolmogorov-Arnold Networks (KAN) with fractal interpolation to approximate non-smooth and rough functions.
AI & ML arxiv | Mar 31
Researchers have used LLMs to evolve entirely new Reinforcement Learning update rules from scratch that compete with human-designed baselines like PPO and SAC.
AI & ML arxiv | Mar 31
The TAG glove system provides high-resolution tactile feedback and precise 21-DoF motion capture for under $1000.
AI & ML arxiv | Mar 31
SPINNER is a tri-rotor UAV that uses continuous self-rotation to expand the field of view of its sensors without adding extra hardware.
AI & ML arxiv | Mar 31
Medical AI Scientist is the first autonomous framework for clinically grounded research ideation and manuscript drafting.
AI & ML arxiv | Mar 31
Vision-Language Models (VLMs) can outperform specialized learning-based placers in chip floorplanning through visual evolutionary optimization.
AI & ML arxiv | Mar 31
DreamLite enables sub-second 1024x1024 image generation and editing on mobile devices using a unified 0.39B parameter model.
AI & ML arxiv | Mar 31