AI & Machine Learning

2,557 papers · Page 33 of 52

Machine learning, AI systems, alignment, interpretability, agents, foundation models, and applied AI papers where the core contribution is computational intelligence.

Filter by category: Paradigm Challenge Breaks Assumption First Ever Nature Is Weird Practical Magic Cosmic Scale Life Origin Open Release Efficiency Leap New Capability Scaling Insight

CanViT is the first task-agnostic active-vision foundation model that reconstructs scenes using low-resolution 'glimpses' with 19.5x fewer FLOPs than existing models.

Breaks Assumption

A large-scale study of 12 reasoning models reveals that internal 'thinking' processes frequently recognize deceptive hints while the final output remains sycophantic.

Instead of using top-activating examples, this method steers Sparse Autoencoder (SAE) features in Vision-Language Models to let the model describe its own internal visual features.

DeIllusionLLM introduces task-level autoregressive reasoning to prevent LLMs from hallucinating answers to ill-posed or faulty scientific questions.

CAM3R is a camera-agnostic 3D reconstruction model that handles fisheye, panoramic, and pinhole imagery without requiring prior calibration.

Inter-Layer Structural Encoders (ILSE) use Cayley graphs to aggregate features from all internal LLM layers, improving accuracy by up to 44% over final-layer-only predictions.

Introduces the first high-performing open-source metric for per-sample AI music quality evaluation.

Provides a massive 2.5M image-to-TikZ dataset and the first instruction-augmented dataset for geometric visual reasoning.

A new statistical test that reliably detects whether a dataset was NOT used in an LLM's training corpus.

Introduces Dual Q-DM, the first non-adversarial imitation learning method theoretically guaranteed to eliminate compounding errors.

Scaling Insight

A quantitative model that predicts the performance gain of merging independent LLM specialists before committing compute.

Breaks Assumption

Proves that logic and lookup-table (LUT) based neural networks are structurally more resilient to hardware bit-flips than standard architectures.

Scaling Insight

Identifies the 'Caterpillar Tree' as the theoretically optimal structure for test-time computation and backtracking in LLMs.

ABSTRAL automates the design of multi-agent systems by treating architectures as evolving, inspectable natural-language documents.

Breaks Assumption

Frontier models' reasoning steps are largely 'decorative' and do not causally determine the final answer in most tasks.

Moving beyond coarse reward signals, this paper introduces token-level policy optimization for multimodal reasoning.

UniQueR reconstructs full 3D scenes (including occluded areas) from unposed images in a single forward pass.

Scaling Insight

Persistent structural memory in neural networks is fundamentally limited by the instability of jointly-learned coordinate systems.

Deep semi-parametric models allow for the instant deletion of training data from a model without retraining or parameter updates.

Efficiency Breakthrough

A 0.26M parameter model using continuous dynamics outperforms 27M parameter recursive models on complex logic tasks like Sudoku-Extreme.

Breaks Assumption

Standard confidence calibration is structurally biased when ground truth labels are ambiguous or annotators disagree.

Efficiency Breakthrough

Agile-VLA enables high-frequency robot control on edge devices by decoupling perception from action through implicit affordance anchoring.

Efficiency Breakthrough

EchoKV introduces a reversible KV cache compression scheme that allows LLMs to switch back to full-precision inference on-demand.

Efficiency Breakthrough

ForestPrune achieves up to 90% token reduction in video MLLMs with minimal accuracy loss using a training-free spatial-temporal forest modeling approach.

Scaling Insight

Theoretical analysis reveals that the efficiency benefits of low-dimensional data structures for diffusion models diminish significantly when the data manifold is non-linear.

This paper moves LLMs from point predictions to set-valued predictions with rigorous statistical coverage guarantees.

WorldMesh generates consistent, large-scale 3D worlds by populating a geometric mesh scaffold with image diffusion-derived content.

Breaks Assumption

Graph Foundation Models (GFMs) are shown to fail when using fixed architectural backbones, requiring a new approach of inference-time architecture adaptivity.

Scaling Insight

Access to conversational memory allows an 8B model to outperform a 235B model on user-specific queries while reducing inference costs by 96%.

Breaks Assumption

A rigorous evaluation shows that simple Probabilistic Circuits often outperform complex diffusion-based models for tabular data generation at a fraction of the cost.

Efficiency Breakthrough

Optimizing autoregressive image models with Group Relative Policy Optimization (GRPO) achieves competitive quality without the 2x inference cost of Classifier-Free Guidance.

Identifies that MLLMs fail to perceive visual illusions due to a high-frequency attention bias and provides a plug-and-play fix that boosts accuracy from 13% to 84%.

Polaris introduces a 'Gödel Agent' framework that allows 7B-parameter models to recursively improve their own policies through auditable code patches.

Efficiency Breakthrough

DILLO enables 14x faster safety-critical agent steering by predicting action consequences from latent states instead of heavy visual simulations.

Breaks Assumption

Exposes a major flaw in medical super-resolution research where models trained on downsampled data fail to recover actual lost structures in real low-resolution scans.

Connects stochastic optimal control to the Schrödinger equation, enabling analytic solutions for long-horizon problems that previously scaled exponentially with difficulty.

Efficiency Breakthrough

ImplicitRM enables unbiased reward modeling from 'messy' implicit feedback (clicks/copies), drastically reducing the cost of RLHF data collection.

Efficiency Breakthrough

Introduces custom CUDA kernels and a sparse packing format that enables Transformers to maintain performance with over 99% feedforward sparsity.

Enables 3D medical image segmentation pre-training using only mathematical formulas and implicit functions, requiring zero real-world data or expert annotations.

Develops a collaborative memory framework that distills agent-agnostic reasoning trajectories, allowing different LLM models to share a single memory system.

Identifies functionally complete safety circuits in LLMs via differentiable binary masks, allowing for near-surgical removal of backdoors and jailbreaks.

Uses Sparse Autoencoders (SAEs) to identify and steer cultural representations in LLMs, eliciting rare cultural concepts that prompting alone misses.

Efficiency Breakthrough

Upgrades video Diffusion Transformers to ultra-high-resolution synthesis using a two-stage 'Relay LoRA' adaptation on pure images.

A dual-path architecture that combines speculative speech-to-speech prefixes with cascaded LLM continuations for zero-latency, high-quality dialogue.

Efficiency Breakthrough

Challenges the dominance of on-policy RL for LLMs by introducing a practical off-policy value-based framework that enables data reuse.

A biology-native transformer architecture that mirrors cellular transcription and translation, enabling interpretable predictions across DNA, RNA, and protein.

A unified framework that decomposes monolithic 3D meshes into 'sim-ready' interactive articulated assets using a sparse 3D VQ-VAE.

Breaks Assumption

Exposes 'shortcut learning' in differentiable simulators where models non-causally exploit future information to 'regret' past mistakes rather than learning to recover.

A generative framework for graphs that closes the fidelity gap between energy-based models and discrete diffusion.

Introduces a 'geospatial model foundry' that learns unified representations from the weights of existing models rather than raw data.