類似事例からの推論を学ぶ:検索拡張型強化ファインチューニング(RA-RFT)
Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning
- 意味的類似ではなく『推論パターンの類似』で文脈を検索する新方式を提案
- 金ラベルからの蒸留で検索器を訓練し、強化学習で推論軌跡の活用法を学ばせる
- AIME 2025でGRPOを最大7.1ポイント上回り、報酬設計とは独立した改善軸を示した
arXivに公開された最新のAI論文を、毎日1本ずつ日本語でわかりやすく要約してお届け。研究者から実務でAIを活用したい人まで役立つ、現場目線のダイジェストです。
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments
続きを読む →
Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning
Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models
When to Align, When to Predict: A Phase Diagram for Multimodal Learning
OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics
How reliable are LLMs when it comes to playing dice?
TailLoR: Protecting Principal Components in Parameter-Efficient Continual Learning
HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers
STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations
Neuron Populations Exhibit Divergent Selectivity with Scale
Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling
Lumos-Nexus: Efficient Frequency Bridging with Homogeneous Latent Space for Video Unified Models
Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software
VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion
DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation
PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity Perspective
Algorithmic Monocultures in Hiring
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
Tokenisation via Convex Relaxations
Integrable Elasticity via Neural Demand Potentials
Vector Policy Optimization: Training for Diversity Improves Test-Time Search
Variance Reduction for Expectations with Diffusion Teachers
Atoms of Thought: Universal EEG Representation Learning with Microstates
TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload
DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention
IVGT: Implicit Visual Geometry Transformer for Neural Scene Representation
ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both
EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation
RefDecoder: Enhancing Visual Generation with Conditional Video Decoding
AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward
ELF: Embedded Language Flows
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation
UniPool: A Globally Shared Expert Pool for Mixture-of-Experts
BAMI: Training-Free Bias Mitigation in GUI Grounding
Taming Outlier Tokens in Diffusion Transformers
A Closed-Form Adaptive-Landmark Kernel for Certified Point-Cloud and Graph Classification
Sample Paper: Replace this with a real arXiv summary