New Capability

333 papers · Page 3 of 7

Papers where something becomes possible that previously was not. New techniques, new instruments, new model behaviors, new measurements at a frontier.

Filter by desk: AI Computing Robotics Math Quantum Physics Space Earth Chemistry Engineering Ecology Biology Neuroscience Health Psychology Economics Society

Trace2Skill distills lessons from across a 'parallel fleet' of execution trajectories into a unified, conflict-free skill directory for LLM agents.

Enable long video generation from short-video diffusion models without any additional training or fine-tuning.

Training-free 6D pose estimation for unseen surgical instruments using only a CAD model as prior knowledge.

Offline Decision Transformers can now synthesize strategies that surpass the classical heuristics they were trained on for the Traveling Salesman Problem.

A foundation model for gait transforms 3D skeletal motion into a systemic biosignal for multi-system health monitoring.

LLMs can be fine-tuned to act as their own 'Z-token' compressors, achieving 18x text reduction without losing reconstruction fidelity.

Defines 'Reasoning Safety' as a new security dimension and introduces a real-time monitor to detect logic-chain hijackings.

Introduces a training-free pipeline for pixel-level video anomaly detection that achieves a 5x improvement in object-level accuracy.

A model-agnostic framework to extract the model-implied causal structure from any trained temporal predictor.

Detects when object detectors fail to see safety-critical objects by measuring semantic misalignment with foundation model embeddings.

Translates a single natural language sentence into a validated, hardware-specific computational imaging system design.

A training-free decoding framework that mitigates multimodal hallucinations by re-ranking tokens based on spatial attention entropy.

Introduces a 'Hybrid Memory' architecture that maintains the identity and motion of dynamic subjects even when they hide out of view.

Inference-time 'steering' of Code LLMs allows for precise control over programming languages and libraries without prompting or fine-tuning.

A universal 'one-shot' medical anomaly detector that outperforms specialized models across nine different datasets.

Sparse Autoencoders (SAEs) can successfully decompose opaque medical vision foundation model embeddings into human-interpretable clinical concepts.

Symbolic-KANs bridge the gap between scalable deep learning and interpretable symbolic regression by embedding discrete library primitives directly into the network.

An 'invariant compiler' uses LLMs to translate physics requirements into Neural ODE architectures that satisfy conservation laws by construction.

POISE demonstrates the first autonomous, evidence-driven discovery of improved policy optimization algorithms for LLMs.

SDZE enables the training of 10-million-dimensional Physics-Informed Neural Networks (PINNs) on a single GPU.

Solves the 'vanishing gradient' problem in 3D Gaussian Splatting (3DGS) tracking by optimizing in the frequency domain using spectral moments.

Restores editable, semantically layered structures from flattened vector graphics (SVGs/icons) by using generative completion to recover occluded geometries.

Identifies that 'attention imbalance' across modalities and tokens drives object hallucinations and proposes a decoding-time rectification (AIR) to fix it.

SOMA provides a plug-and-play memory and orchestration system that increases Vision-Language-Action (VLA) robot success rates by over 50% without fine-tuning.

Breaks the resolution and aspect ratio barriers of image diffusion models, enabling the generation of consistent 32K resolution images.

Applies reinforcement learning with a cycle-consistency reward to drastically improve natural language to Lean4 autoformalization.

Reformulates molecular discovery as an autonomous MCTS planning problem over executable chemical operations rather than just similarity-based prediction.

An autonomous agentic pipeline discovered novel white-box adversarial attacks that outperform existing methods by up to 300%.

UI-Voyager achieves an 81.0% success rate on AndroidWorld, exceeding human-level performance in mobile GUI automation.

Wasserstein Parallel Transport provides a formal framework for counterfactual prediction in evolving probability distributions.

Small adapters can provide frozen decoder-only LLMs with persistent latent-space memory that survives across separate sessions.

Introduces a framework for LLMs to self-improve reasoning in specific domains by autonomously mining and constructing training environments directly from the open web.

Leverages unstructured clinical notes during training to boost the performance of models that are deployed using only structured EHR data.

CanViT is the first task-agnostic active-vision foundation model that reconstructs scenes using low-resolution 'glimpses' with 19.5x fewer FLOPs than existing models.

CAM3R is a camera-agnostic 3D reconstruction model that handles fisheye, panoramic, and pinhole imagery without requiring prior calibration.

A new statistical test that reliably detects whether a dataset was NOT used in an LLM's training corpus.

ABSTRAL automates the design of multi-agent systems by treating architectures as evolving, inspectable natural-language documents.

UniQueR reconstructs full 3D scenes (including occluded areas) from unposed images in a single forward pass.

Deep semi-parametric models allow for the instant deletion of training data from a model without retraining or parameter updates.

WorldMesh generates consistent, large-scale 3D worlds by populating a geometric mesh scaffold with image diffusion-derived content.

Identifies that MLLMs fail to perceive visual illusions due to a high-frequency attention bias and provides a plug-and-play fix that boosts accuracy from 13% to 84%.

Polaris introduces a 'Gödel Agent' framework that allows 7B-parameter models to recursively improve their own policies through auditable code patches.

Develops a collaborative memory framework that distills agent-agnostic reasoning trajectories, allowing different LLM models to share a single memory system.

Identifies functionally complete safety circuits in LLMs via differentiable binary masks, allowing for near-surgical removal of backdoors and jailbreaks.

Uses Sparse Autoencoders (SAEs) to identify and steer cultural representations in LLMs, eliciting rare cultural concepts that prompting alone misses.

A unified framework that decomposes monolithic 3D meshes into 'sim-ready' interactive articulated assets using a sparse 3D VQ-VAE.

A generative framework for graphs that closes the fidelity gap between energy-based models and discrete diffusion.

A bilevel framework where an outer LLM loop meta-optimizes an inner autoresearch loop by autonomously generating and injecting Python code at runtime.

Integrates tactile perception into video-action models to enable high-fidelity force modulation in contact-rich robotic tasks.

A unified reinforcement learning framework that jointly optimizes reasoning (text) and synthesis (image) for interleaved multimodal generation.