论文雷达

聚焦大模型架构、训练、推理、评测与多模态方向，提炼论文贡献、边界和工程启发。

自动雷达看板

由抓取与清洗脚本自动生成，每次更新会重建每日流、每周精选和关键词趋势。

总条目

363

可视窗口

120

来源分布（总）

Manual 3 · arXiv 360

来源分布（窗口）

Manual 0 · arXiv 120

每周精选

范围 2026-03-14 到 2026-03-21

计算机视觉 Vision

NavTrust: Benchmarking Trustworthiness for Embodied Navigation

There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performance under nominal conditions, overlooking the potential corruptions that arise in real-world settings. To address this gap, we present NavTrust, a unified benchmark that systematically corrupts input modalities, including RGB, depth, and instructions, in realistic scenarios and evaluates their impact on navigation performance. To our best knowledge, NavTrust is the first benchmark that exposes embodied navigation agents to diverse RGB-Depth corruptions and instruction variations in a unified framework. Our extensive evaluation of seven state-of-the-art approaches reveals substantial performance degradation under realistic corruptions, which highlights critical robustness gaps and provides a roadmap toward more trustworthy embodied navigation systems. Furthermore, we systematically evaluate four distinct mitigation strategies to enhance robustness against RGB-Depth and instructions corruptions. Our base models include Uni-NaVid and ETPNav. We deployed them on a real mobile robot and observed improved robustness to corruptions. The project website is: https://navtrust.github.io.

2026 · arXiv

站内搜索同主题

近 7 天内更新，主题 Vision，综合评分 30.0

自然语言处理 NLP

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals, demonstrating remarkably high intelligence density with 20x fewer parameters. In contrast to Nemotron-Cascade 1, the key technical advancements are as follows. After SFT on a meticulously curated dataset, we substantially expand Cascade RL to cover a much broader spectrum of reasoning and agentic domains. Furthermore, we introduce multi-domain on-policy distillation from the strongest intermediate teacher models for each domain throughout the Cascade RL process, allowing us to efficiently recover benchmark regressions and sustain strong performance gains along the way. We release the collection of model checkpoint and training data.

2026 · arXiv

站内搜索同主题

近 7 天内更新，主题 NLP，综合评分 30.0

自然语言处理 NLP

Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Language Navigation

Robots collaborating with humans must convert natural language goals into actionable, physically grounded decisions. For example, executing a command such as "go two meters to the right of the fridge" requires grounding semantic references, spatial relations, and metric constraints within a 3D scene. While recent vision language models (VLMs) demonstrate strong semantic grounding capabilities, they are not explicitly designed to reason about metric constraints in physically defined spaces. In this work, we empirically demonstrate that state-of-the-art VLM-based grounding approaches struggle with complex metric-semantic language queries. To address this limitation, we propose MAPG (Multi-Agent Probabilistic Grounding), an agentic framework that decomposes language queries into structured subcomponents and queries a VLM to ground each component. MAPG then probabilistically composes these grounded outputs to produce metrically consistent, actionable decisions in 3D space. We evaluate MAPG on the HM-EQA benchmark and show consistent performance improvements over strong baselines. Furthermore, we introduce a new benchmark, MAPG-Bench, specifically designed to evaluate metric-semantic goal grounding, addressing a gap in existing language grounding evaluations. We also present a real-world robot demonstration showing that MAPG transfers beyond simulation when a structured scene representation is available.

2026 · arXiv

站内搜索同主题

近 7 天内更新，主题 NLP，综合评分 30.0

计算机视觉 Vision

DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction and Understanding

With the growing adoption of vision-language-action models and world models in autonomous driving systems, scalable image tokenization becomes crucial as the interface for the visual modality. However, most existing tokenizers are designed for monocular and 2D scenes, leading to inefficiency and inter-view inconsistency when applied to high-resolution multi-view driving scenes. To address this, we propose DriveTok, an efficient 3D driving scene tokenizer for unified multi-view reconstruction and understanding. DriveTok first obtains semantically rich visual features from vision foundation models and then transforms them into the scene tokens with 3D deformable cross-attention. For decoding, we employ a multi-view transformer to reconstruct multi-view features from the scene tokens and use multiple heads to obtain RGB, depth, and semantic reconstructions. We also add a 3D head directly on the scene tokens for 3D semantic occupancy prediction for better spatial awareness. With the multiple training objectives, DriveTok learns unified scene tokens that integrate semantic, geometric, and textural information for efficient multi-view tokenization. Extensive experiments on the widely used nuScenes dataset demonstrate that the scene tokens from DriveTok perform well on image reconstruction, semantic segmentation, depth prediction, and 3D occupancy prediction tasks.

2026 · arXiv

站内搜索同主题

近 7 天内更新，主题 Vision，综合评分 29.7

通用主题 General

AlignMamba-2: Enhancing Multimodal Fusion and Sentiment Analysis with Modality-Aware Mamba

In the era of large-scale pre-trained models, effectively adapting general knowledge to specific affective computing tasks remains a challenge, particularly regarding computational efficiency and multimodal heterogeneity. While Transformer-based methods have excelled at modeling inter-modal dependencies, their quadratic computational complexity limits their use with long-sequence data. Mamba-based models have emerged as a computationally efficient alternative; however, their inherent sequential scanning mechanism struggles to capture the global, non-sequential relationships that are crucial for effective cross-modal alignment. To address these limitations, we propose \textbf{AlignMamba-2}, an effective and efficient framework for multimodal fusion and sentiment analysis. Our approach introduces a dual alignment strategy that regularizes the model using both Optimal Transport distance and Maximum Mean Discrepancy, promoting geometric and statistical consistency between modalities without incurring any inference-time overhead. More importantly, we design a Modality-Aware Mamba layer, which employs a Mixture-of-Experts architecture with modality-specific and modality-shared experts to explicitly handle data heterogeneity during the fusion process. Extensive experiments on four challenging benchmarks, including dynamic time-series (on the CMU-MOSI and CMU-MOSEI datasets) and static image-related tasks (on the NYU-Depth V2 and MVSA-Single datasets), demonstrate that AlignMamba-2 establishes a new state-of-the-art in both effectiveness and efficiency across diverse pattern recognition tasks, ranging from dynamic time-series analysis to static image-text classification.

2026 · arXiv

站内搜索同主题

近 7 天内更新，主题 General，综合评分 29.5

通用主题 General

Accurate and Efficient Multi-Channel Time Series Forecasting via Sparse Attention Mechanism

The task of multi-channel time series forecasting is ubiquitous in numerous fields such as finance, supply chain management, and energy planning. It is critical to effectively capture complex dynamic dependencies within and between channels for accurate predictions. However, traditional method paid few attentions on learning the interaction among channels. This paper proposes Linear-Network (Li-Net), a novel architecture designed for multi-channel time series forecasting that captures the linear and non-linear dependencies among channels. Li-Net dynamically compresses representations across sequence and channel dimensions, processes the information through a configurable non-linear module and subsequently reconstructs the forecasts. Moreover, Li-Net integrates a sparse Top-K Softmax attention mechanism within a multi-scale projection framework to address these challenges. A core innovation is its ability to seamlessly incorporate and fuse multi-modal embeddings, guiding the sparse attention process to focus on the most informative time steps and feature channels. Through the experiment results on multiple real-world benchmark datasets demonstrate that Li-Net achieves competitive performance compared to state-of-the-art baseline methods. Furthermore, Li-Net provides a superior balance between prediction accuracy and computational burden, exhibiting significantly lower memory usage and faster inference times. Detailed ablation studies and parameter sensitivity analyses validate the effectiveness of each key component in our proposed architecture. Keywords: Multivariate Time Series Forecasting, Sparse Attention Mechanism, Multimodal Information Fusion, Non-linear relationship

2026 · arXiv

站内搜索同主题

近 7 天内更新，主题 General，综合评分 29.4

关键词趋势

近 7 天 vs 前 23 天（总窗口 30 天）

关键词数 16

上升 8

稳定 3

回落 5

样本窗口 排序回退窗 · 84/276

主题簇 基础模型 5 · 应用与场景 4 · 评测与工具 2

主题簇关键词

可见 12 / 12

包含连升信号包含回落项包含建议段

Markdown Preview

研究版摘要预览

当前筛选结果将生成在这里。

会基于当前筛选、摘要模板和摘要选项，同时生成研究卡 / 运营卡 / 工程卡三张图片；ZIP 包适合一次性发给团队。

Asset Center

模板资产中心

查看当前筛选下将生成的文件名、适用场景与 ZIP 打包文件，方便直接发给设计、运营或研发同学。

ZIP 包

radar-template-kit-0000-00-00.zip

研究卡

论文解读 / 周报正文 / 知识库封面

当前模板


radar-research-card-0000-00-00.png

适合强调结论结构、关键词信号和阅读延展。

运营卡

社群播报 / 热点海报 / 公众号配图

当前模板


radar-research-card-0000-00-00.png

适合高频传播、快速转发和封面型内容场景。

工程卡

系统复盘 / 部署同步 / 研发例会

当前模板


radar-research-card-0000-00-00.png

适合强调工程关注项、指标与判断依据。

Bundle Manifest

ZIP 清单预览

这里会展示当前筛选下将被打进 ZIP 包的文件清单，方便导出前快速确认。


radar-template-kit-0000-00-00.zip

研究卡

适合知识沉淀与深度解读

当前模板


radar-research-card-0000-00-00.png

运营卡

适合传播与热点封面

当前模板


radar-research-card-0000-00-00.png

工程卡

适合工程复盘与同步

当前模板


radar-research-card-0000-00-00.png

Template Gallery

多模板画廊

同一份摘要内容，实时比较三种图片模板的版式节奏与传播风格。

当前导出模板：研究卡

Analytic

研究卡

适合论文解读、周报沉淀与结构化观察。

当前导出模板

正在生成预览...

强调条理、信息层级与摘要完成度

Pulse

运营卡

适合热点播报、社群传播与封面配图。

可随时切换

正在生成预览...

突出热度信号、节奏感与传播效率

Systems

工程卡

适合工程复盘、部署观察与指标同步。

可随时切换

正在生成预览...

突出系统关注项、配置信号与工程判断

计算机视觉 Vision

应用与场景

近 7 天

前 23 天

39/90 上升 Δ 0.138 R 1.42

通用智能 General AI

研究前沿

近 7 天

前 23 天

29/131 回落 Δ -0.129 R 0.73

智能体 Agent

应用与场景

近 7 天

前 23 天

8/44 回落 Δ -0.064 R 0.60

多模态 Multimodal

应用与场景

近 7 天

前 23 天

12/23 上升 Δ 0.059 R 1.71

机器人 Robotics

应用与场景

近 7 天

前 23 天

8/9 上升 Δ 0.063 R 2.92

机器学习 Machine Learning

基础模型

近 7 天

前 23 天

33/103 上升 Δ 0.020 R 1.05

评测基准 Benchmark

评测与工具

近 7 天

前 23 天

16/41 上升 Δ 0.042 R 1.28

自然语言处理 NLP

基础模型

近 7 天

前 23 天

18/49 上升 Δ 0.037 R 1.21

推理优化 Inference

系统工程

近 7 天

前 23 天

12/50 回落 Δ -0.038 R 0.79

人机交互 HCI

评测与工具

近 7 天

前 23 天

1/11 回落 Δ -0.028 R 0.30

对齐与安全 Alignment

安全与治理

近 7 天

前 23 天

12/36 稳定 Δ 0.012 R 1.10

注意力机制 Attention

基础模型

近 7 天

前 23 天

10/30 稳定 Δ 0.010 R 1.10

站内论文列表

按关键词、年份和主题过滤，并支持排序浏览。

共 0 条结果

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

用选择性状态空间模型挑战 Transformer 在长序列建模中的统治地位，强调线性复杂度、内容选择和硬件友好实现。

2023 · arXiv · 2 位作者

Mamba 状态空间模型长序列非 Transformer

Efficient Memory Management for Large Language Model Serving with PagedAttention

把操作系统的分页思想引入 KV Cache 管理，显著提升大模型服务吞吐，是现代推理系统论文中的关键代表。

2023 · arXiv · 6 位作者

vLLM PagedAttention 推理系统连续批处理

Lost in the Middle

研究长上下文模型对不同位置证据的利用能力，指出“上下文很长”并不等于“中间信息会被有效使用”。

2023 · TACL · 3 位作者

长上下文评测检索能力 RAG

LLM 评测方法论：从 MMLU 到 MT-Bench

把知识问答、代码能力、对话质量和 LLM-as-a-Judge 放到同一张图里，帮助读者理解“模型更强”到底应该怎样被验证。

2023 · arXiv · 5 位作者

评测 MMLU HumanEval MT-Bench

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

把偏好对齐从“奖励模型 + PPO”压缩成更直接的优化目标，显著降低了 RLHF 流程的复杂度。

2023 · arXiv · 6 位作者

DPO 偏好学习对齐 RLHF

LLaVA：Visual Instruction Tuning 让大模型真正看图对话

用视觉编码器加语言模型，再通过视觉指令微调把“能看图”升级为“能围绕图片对话”，成为开源多模态 LLM 的关键里程碑。

2023 · NeurIPS Workshop · 4 位作者

LLaVA 多模态视觉指令微调开源模型

LLaMA: Open and Efficient Foundation Language Models

用更克制的参数规模、更长的训练 token 和一组细致的架构改造，证明开源基座模型也能逼近闭源大模型能力。

2023 · arXiv · 5 位作者

LLaMA 开源模型训练策略 Transformer

Toolformer：语言模型如何自学使用工具

让语言模型通过自监督方式学习何时调用搜索、计算器、翻译等外部工具，把“会说”进一步推进到“会在合适时机借力外部系统”。

2023 · NeurIPS · 5 位作者

Toolformer 工具调用 Agent Prompting

Constitutional AI: Harmlessness from AI Feedback

用“原则列表 + 自我批评 + AI 反馈”重构对齐流程，尝试减少对大规模人工偏好标注的依赖。

2022 · arXiv · 5 位作者

Constitutional AI 安全对齐 RLAIF 对齐

Fast Inference from Transformers via Speculative Decoding

通过“草稿模型先猜、大模型再批量验证”的两阶段解码，把自回归推理中的串行瓶颈部分摊薄，是现代低延迟推理的重要方向之一。

2023 · ICML · 3 位作者

Speculative Decoding 推理优化解码系统性能

ReAct：让语言模型在推理与行动之间来回切换

把显式推理链和外部行动结合起来，让模型不只“想”，还会在需要时查资料、调工具、再继续推理，成为 Agent 设计的经典起点。

2022 · ICLR · 5 位作者

ReAct Agent Prompting 工具调用

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

把注意力优化重点从 FLOPs 转向 IO，把“Exact Attention 也能大幅提速”变成现实，是现代训练和推理系统的关键基石之一。

2022 · arXiv · 5 位作者

FlashAttention Attention 训练优化推理优化

Training Language Models to Follow Instructions with Human Feedback

用 SFT、奖励模型和 PPO 构建 RLHF 闭环，让语言模型从“会续写”走向“更会按人类意图回答”。

2022 · arXiv · 5 位作者

InstructGPT RLHF 对齐 PPO

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

用“先写中间推理步骤”的提示方式显著提升复杂推理任务表现，让 prompt 从输入模板升级为推理激活器。

2022 · NeurIPS · 5 位作者

Chain-of-Thought Prompting 推理 Few-shot

LoRA: Low-Rank Adaptation of Large Language Models

用低秩分解近似全量权重更新，把大模型微调从高门槛 GPU 任务，变成更可复现、更可扩展的工程实践。

2021 · ICLR · 5 位作者

LoRA 高效微调微调 Adapter

CLIP：用自然语言监督统一视觉与文本表示

用海量图文对做对比学习，让视觉模型第一次真正学会“按语言理解图片”，成为后续多模态大模型的重要基础设施。

2021 · ICML · 5 位作者

CLIP 多模态视觉-语言对比学习

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

用更简单的单专家路由，把 Mixture of Experts 从难以训练的研究方向，推进成可规模化的大模型架构选择。

2021 · arXiv · 3 位作者

MoE Switch Transformer 稀疏激活路由

GPT 系列演进（GPT → GPT-2 → GPT-3）

从 GPT 的生成式预训练，到 GPT-2 的无监督多任务能力，再到 GPT-3 的 few-shot 涌现，这条路线定义了现代通用大模型的主舞台。

2020 · NeurIPS · 5 位作者

GPT 预训练 Few-shot 自回归

RAG 综述：检索增强生成的系统主线

从经典 RAG 论文出发，梳理检索、重排序、上下文组装到生成的完整链路，解释为什么“把文档塞进模型”远远不够。

2020 · NeurIPS · 5 位作者

RAG 检索增强重排序向量检索

Scaling Laws for Neural Language Models

系统研究参数量、数据量、计算量与损失之间的幂律关系，把“大模型该如何扩”从经验判断变成了可估算的问题。

2020 · arXiv · 3 位作者

Scaling Laws 训练策略计算预算数据配比

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

用 MLM 与 NSP 把双向 Transformer 预训练推向主流，重塑了 NLP 从预训练到下游微调的默认范式。

2018 · NAACL · 4 位作者

BERT 预训练 Transformer NLP

Attention Is All You Need

提出 Transformer 架构，以纯注意力机制替代 RNN/CNN，重写了序列建模的工程范式与研究方向。

2017 · NeurIPS · 4 位作者

Transformer Attention 架构位置编码