New Capability

333 papers · Page 2 of 7

Papers where something becomes possible that previously was not. New techniques, new instruments, new model behaviors, new measurements at a frontier.

Filter by desk: AI Computing Robotics Math Quantum Physics Space Earth Chemistry Engineering Ecology Biology Neuroscience Health Psychology Economics Society

RAGent enables training-free, deployment-time human activity recognition for mmWave radar using agentic reasoning.

Bridges the gap between free-form natural language and safety-critical UAV navigation using Signal Temporal Logic (STL) translation and repair.

TianJi is the first 'AI meteorologist' system capable of autonomously driving complex numerical models to verify physical hypotheses in atmospheric science.

Heracles uses a state-conditioned diffusion middleware to bridge precise motion tracking with generative recovery for humanoid robots.

Sortify is the first fully autonomous LLM agent deployed in production for closed-loop recommendation ranking optimization.

AutoStan demonstrates a CLI coding agent that autonomously builds and iteratively improves interpretable Bayesian models in Stan.

Introduces SCOUT, a routing framework that intelligently selects which Image-to-3D reconstruction model to use based on input difficulty and cost constraints.

GraySense enables geospatial object tracking using only encrypted network packet sizes without any access to raw video streams.

Wan-R1 successfully applies Group Relative Policy Optimization (GRPO) to flow-based video models to enable verifiable spatial reasoning.

Poppy provides a training-free way to refine monocular surface normals using single-shot polarization measurements at test time.

ATLAS-RTC introduces token-level runtime control that detects and corrects LLM drift from structured output contracts during the forward pass.

Guardrails successfully implements and flight-tests Control Barrier Functions on an F-16 fighter jet to enforce safety limits in real-time.

Iterative Motion Imitation enables bicycle robots to perform unassisted front-flips by learning from initially 'impossible' reference motions.

Proteina-Complexa unifies generative flow-based modeling with structure-based 'hallucination' to set a new SOTA in atomistic protein binder design.

The first framework for bit-identical deep learning training that produces MD5-verified identical weights across independent runs.

Meta-Harness automates the engineering of the 'code' surrounding LLMs, improving RAG and agent performance by optimizing retrieval and context management logic.

A training-free metacognitive framework that gives LLMs explicit control over expanding, pruning, and repairing reasoning trajectories during inference.

Presents PReD, the first foundation model and 1.3M-sample dataset specifically for electromagnetic signal perception and decision-making.

Transitions reasoning model optimization from coarse sequence-level advantages to fine-grained token dynamics.

Enhances Kolmogorov-Arnold Networks (KAN) with fractal interpolation to approximate non-smooth and rough functions.

Researchers have used LLMs to evolve entirely new Reinforcement Learning update rules from scratch that compete with human-designed baselines like PPO and SAC.

The TAG glove system provides high-resolution tactile feedback and precise 21-DoF motion capture for under $1000.

SPINNER is a tri-rotor UAV that uses continuous self-rotation to expand the field of view of its sensors without adding extra hardware.

Medical AI Scientist is the first autonomous framework for clinically grounded research ideation and manuscript drafting.

Vision-Language Models (VLMs) can outperform specialized learning-based placers in chip floorplanning through visual evolutionary optimization.

DreamLite enables sub-second 1024x1024 image generation and editing on mobile devices using a unified 0.39B parameter model.

A decentralized system that automates ML research and trains domain-expert 1.58-bit ternary models for CPU-native inference.

Modulates LLM hidden states with eye-gaze data to outperform GPT-4o by 10.5 points on streaming video understanding.

Fixes physically impossible video generation by disentangling semantic prompts from physical dynamics during training.

Integrates radiologist gaze data as a probabilistic prior to align vision-language models with actual human clinical reasoning workflows.

Introduces ReinPatch, the first framework to jointly optimize sequence tokenization and backbone models using reinforcement learning.

Moves coding agents from passive execution to proactive collaboration by teaching them when to ask for clarification on underspecified tasks.

Provides mechanistic evidence that LLMs internalize 'vibes' (informal registers like slang) as language-agnostic abstractions that can be causally steered.

Enables GUI agents to overcome domain bias by autonomously 'watching' web tutorial videos to learn specific software workflows without retraining.

Introduces a label-free, output-agnostic method for merging LoRA modules across heterogeneous tasks like classification and regression.

Enables verification of claimed text-to-image models through boundary-aware prompts that trigger model-specific instability.

Boosts multimodal reasoning by teaching models to autonomously verify their own long-form generations against image evidence using information gain.

Enables high-quality, spatio-temporally consistent 4D reconstruction using sparse, uncalibrated camera inputs instead of expensive synchronized arrays.

Architects an autonomous AI research agent that significantly surpasses previous benchmarks by utilizing asynchronous multi-GPU scaling and a hidden consistent evaluation protocol.

A model-agnostic framework that uses synthetic sampling to provide statistically valid uncertainty quantification and hallucination detection for multimodal models.

Moves medical AI from simplified 2D image classification to agents navigating full 3D clinical studies.

Enables semantically precise model editing directly in the weight space without any training data.

Estimates lab-grade 3D musculoskeletal forces from a single smartphone video.

Quantifies near-verbatim data extraction risk in LLMs at 1/5000th the computational cost of standard Monte Carlo methods.

Enables graph-based retrieval and reranking for RAG without the maintenance overhead of a knowledge graph.

GeoNDC introduces a queryable neural data cube that compresses 20 years of planetary satellite data by 95x while allowing on-demand continuous-time reconstruction.

Intern-S1-Pro is the first trillion-parameter scientific multimodal foundation model, outperforming proprietary models on specialized scientific reasoning.

AirVLA successfully transfers manipulation-trained Vision-Language-Action (VLA) models to underactuated aerial robots using a payload-aware guidance mechanism.

Z-Erase introduces the first concept erasure method for single-stream diffusion transformers, preventing generation collapse in new unified architectures.

SEVerA enables the synthesis of self-evolving agents with formal guarantees by combining LLM planning with first-order logic rejection samplers.