AI & ML

1625 papers · Page 17 of 17

Introduces the first billion-scale SAR vision foundation model and a massive unified benchmark for all-weather geospatial semantic segmentation.

Open Release arxiv | Mar 13

Demonstrates that simply using XML tags during translation outperforms complex pipelines for cross-lingual label projection while actually improving translation quality.

Breaks Assumption arxiv | Mar 13

Achieves up to 14.4x higher decoding throughput in long-context LLMs via a training-free framework that reuses sparse memory at semantic boundaries.

Efficiency Breakthrough arxiv | Mar 13

Enables multimodal agents to continually improve from experience and skills without any parameter updates through a dual-stream visual grounding framework.

New Capability arxiv | Mar 13

A 3D vision-language pipeline that grounds medical diagnosis in longitudinal brain MRI via regional volumetric assessments to eliminate VLM hallucinations.

New Capability arxiv | Mar 13

Integrates Neural ODEs with NeRFs to enable continuous-time scene dynamics that can extrapolate far beyond the original training sequence.

New Capability arxiv | Mar 13

Proposes a unified image tokenizer that reconciles the conflicting requirements of visual understanding and generation using a residual evolution process.

Paradigm Shift arxiv | Mar 13

Identifies and solves the 'information self-locking' failure mode where RL-trained agents stop asking informative questions in active reasoning tasks.

Breaks Assumption arxiv | Mar 13

A specialized distributed serving system for 'Any-to-Any' multimodal models that achieves 5.79x lower tail latency via component disaggregation.

Efficiency Breakthrough arxiv | Mar 13

Shows that LLM self-correction fails primarily due to 'session context' and can be significantly improved by moving the review to a fresh, independent session.

Breaks Assumption arxiv | Mar 13

Automates the generation of GPU-parallelized RL environments from text/code specifications, achieving up to 22,000x speedups for less than $10.

Efficiency Breakthrough arxiv | Mar 13

Establishes scaling laws for sampling compute in LLM Reinforcement Learning, providing a playbook for optimal parallel rollout and batch allocation.

Scaling Insight arxiv | Mar 13

Selects high-quality synthetic code data using 'Reverse Mutual Information' to achieve full-dataset performance with 75% less data.

Efficiency Breakthrough arxiv | Mar 13

Accelerates sparse attention by 75% by reusing lightning indexer decisions across layers, tackling the hidden bottleneck in production-grade LLMs.

Efficiency Breakthrough arxiv | Mar 13

Discovers that task-specific experts are so dense around pretrained weights that random parameter perturbations can compete with complex RL methods like PPO.

Breaks Assumption arxiv | Mar 13

Reveals that 'Reasoning LLMs-as-Judges' can lead to policies that generate highly effective adversarial outputs to deceive other judges and inflate benchmarks.

Breaks Assumption arxiv | Mar 13

Introduces a feature-matching objective for LLM fine-tuning that targets sequence-level statistics without requiring reward models or ground-truth verifiers.

Paradigm Shift arxiv | Mar 13

Integrates Chain-of-Thought reasoning directly into the Diffusion Transformer denoising process to solve complex spatial and logical tasks.

New Capability arxiv | Mar 13

Reduces visual tokens by up to 100x using an autoregressive gazing module, enabling 19x faster 4K/1000-frame video understanding.

Efficiency Breakthrough arxiv | Mar 13

Uncovers an emergent Hue-Saturation-Lightness (HSL) subspace in FLUX.1's VAE latent space, allowing for precise, training-free color control.

Breaks Assumption arxiv | Mar 13

Enables VideoLLMs to perform complex logical reasoning simultaneously with video playback without incurring the latency of standard test-time scaling.

New Capability arxiv | Mar 13

An open foundation model for humanoid robots that achieves high performance using only 30 hours of real-world robot data by pre-training on egocentric human videos.

Open Release arxiv | Mar 13

A unified streaming visual backbone that performs perception, 3D reconstruction, and robotic action simultaneously from a continuous video stream.

New Capability arxiv | Mar 13

Introduces adaptive video tokenization that allocates tokens based on scene complexity, reducing token usage by 24% while improving reconstruction quality.

Efficiency Breakthrough arxiv | Mar 13

Demonstrates that the stochasticity in standard regularized model training (like cross-validation) can serve as a 'free' and effective exploration strategy for contextual bandits.

Paradigm Shift arxiv | Mar 13