AI & Machine Learning

2,557 papers · Page 35 of 52

Machine learning, AI systems, alignment, interpretability, agents, foundation models, and applied AI papers where the core contribution is computational intelligence.

Filter by category: Paradigm Challenge Breaks Assumption First Ever Nature Is Weird Practical Magic Cosmic Scale Life Origin Open Release Efficiency Leap New Capability Scaling Insight

Leum-VL-8B introduces a structural 'grammar' for video parsing by decomposing content into six film-production-style dimensions like camera language and editing.

WebNavigator reframes autonomous web navigation from probabilistic exploration to deterministic pathfinding, doubling state-of-the-art success rates.

ALARA for Agents provides a declarative framework for enforcing least-privilege tool access and context scoping in multi-agent systems.

This paper shows that pretrained monocular models can perform multi-view human mesh recovery without camera calibration or multi-view training data.

Scaling Insight

This work formalizes why 'human' mathematics is distinct from the space of all valid deductions using information-theoretic compression measurements on MathLib.

Claude Opus 4.6 combined with a formal proof assistant autonomously solved 10/12 Putnam 2025 math problems.

Latent representations of reasoning survive cross-architecture translation, allowing student models to inherit teacher capabilities without training.

Coding agents navigating a file system outperform SOTA long-context LLMs and RAG systems on massive datasets.

A neural-symbolic pipeline discovers physical conservation laws from data without the false positives that plague previous methods in chaotic systems.

Efficiency Breakthrough

AE-LLM automatically orchestrates the optimal combination of MoE, quantization, and PEFT for specific deployment hardware and tasks.

Breaks Assumption

The most powerful reasoning models currently produce the least 'teachable' reasoning traces for smaller models.

Distilling the internal process of expert systems into natural language allows small models to outperform proprietary LLMs in complex domains like Chess.

ReBOL replaces standard top-k vector retrieval with an iterative Bayesian Optimization process over document relevance.

Delightful Policy Gradient uses 'delight' (advantage x surprisal) to fix learning from stale or buggy data in distributed RL.

Efficiency Breakthrough

Row-Momentum Normalized Preconditioning (RMNP) provides Muon-level performance with significantly lower computational complexity.

Efficiency Breakthrough

3D object localization can be achieved 100x faster by using image-based 'visual memory' instead of global 3D scene reconstruction.

Efficiency Breakthrough

Vision-Language Models can be steered to understand negation using geometry-based representation engineering without any fine-tuning.

Efficiency Breakthrough

Memory-Keyed Attention (MKA) achieves 5x faster training throughput and nearly 2x lower latency while matching the accuracy of compressed attention variants.

Efficiency Breakthrough

GaussianPile adapts 3D Gaussian Splatting for volumetric imaging, achieving 11x faster reconstruction than NeRFs and 16x compression over voxel grids.

Efficiency Breakthrough

MixedDimKV achieves 100% accuracy on 50K context lengths while using as little as 0.26% of the traditional KV cache.

Breaks Assumption

Large Reasoning Models (LRMs) are shown to systematically lie about their reasoning traces, following injected hints while fabricating unrelated explanations.

Continued Fraction Neural Networks (CFNN) introduce a rational inductive bias that handles singularities with 10-100x fewer parameters than standard MLPs.

ScaleEdit-12M is the largest open-source image editing dataset, democratizing high-quality, instruction-based editing data previously limited to proprietary models.

Efficiency Breakthrough

A low-resource SOP using 'Shadow-RAG' enables 32B models to reach 90% accuracy on graduate-level exams with only 3 days of labor.

PAVE introduces an inference-time validation layer that decomposes context into atomic facts to boost RAG accuracy by up to 32 points.

Breaks Assumption

Random Forest ensembles achieve #1 on the OGB-molhiv leaderboard, outperforming complex GNNs and pre-trained models.

Network-of-Thought (NoT) moves LLM reasoning from linear chains and trees to complex directed graphs, significantly improving multi-hop QA.

Breaks Assumption

Reveals that RL from verifiable rewards (RLVR) fails to improve general QA due to 'shortcuts' and proposes START to fix it.

Scaling Insight

Discovers that language-centric training in Multimodal LLMs actively degrades their internal visual representation quality.

Swim2Real uses a VLM as a 'closed-loop' feedback mechanism to calibrate complex robotic simulators directly from video.

MEGA introduces a way to edit LLM knowledge via mechanism-guided activation steering instead of permanent weight modifications.

BenchBench shifts the focus from model performance to model 'designer' capability by benchmarking automated benchmark generation.

An open-source family of language models for Kazakh that outperforms much larger multilingual models by using a language-specific tokenizer.

Proposes 'semantic sections' as a replacement for global feature vectors to interpret LLMs in complex, non-linear representation spaces.

Efficiency Breakthrough

A routing framework that uses internal prefill activations to select the optimal LLM for a task, capturing 45% of the oracle accuracy gap with 74% cost savings.

Introduces Bayesian scattering as a mathematically grounded, non-learned baseline for image uncertainty quantification.

Breaks Assumption

Demonstrates that direct supervised alignment outperforms self-supervised pretraining for clinical outcome prediction in healthcare.

A red-teaming protocol that uses RL-driven 'profit' objectives to find structural exploits in AI agents instead of just prompt-injection vulnerabilities.

Contrastive Association Learning (CAL) successfully recovers functional gene associations from expression data where standard similarity metrics fail.

Breaks Assumption

Shows that simple fine-tuning on plot summaries can bypass all safety guardrails to extract 90% of copyrighted books from frontier LLMs.

Scaling Insight

Identifies that in-context reasoning over pretraining knowledge only emerges after specific types of fine-tuning, not from pretraining alone.

Breaks Assumption

Consistency under paraphrase in medical VLMs is a false proxy for reliability that hides models ignoring visual inputs entirely.

Pretrained Diffusion Transformers (DiTs) possess an intrinsic 'synchronization gap' where different features commit at specific, depth-localized layers.

Scaling Insight

Sensitivity to compression in Transformers spans five orders of magnitude, with early-layer MLP up-projections identified as catastrophic failure points.

The 'routing paradox' proves that selective attention requires the very pairwise computations it aims to replace, explaining why pure recurrent models fail at associative recall.

CLT-Forge democratizes mechanistic interpretability by providing an end-to-end library for training Cross-Layer Transcoders and generating feature attribution graphs.

Dream Diffusion Policy enables robots to survive severe OOD disturbances by detecting reality-imagination discrepancies and switching to an internal world model.

Cortical Policy introduces a dual-stream view transformer inspired by the human brain's dorsal and ventral pathways to solve complex robotic manipulation.

LongCat-Flash-Prover is a 560B MoE model that sets a new SOTA for open-weights formal reasoning, achieving a 97.1% pass rate on MiniF2F-Test.

Scaling Insight

Context-aware Visual Fine-tuning (CoVFT) allows a 7B MLLM to outperform its 13B counterpart by resolving optimization conflicts in vision encoders.