论文雷达

聚焦大模型架构、训练、推理、评测与多模态方向,提炼论文贡献、边界和工程启发。

自动雷达看板

由抓取与清洗脚本自动生成,每次更新会重建每日流、每周精选和关键词趋势。

总条目

363

可视窗口

120

来源分布(总)

Manual 3 · arXiv 360

来源分布(窗口)

Manual 0 · arXiv 120

最新构建

2026/03/21 19:13

每周精选

范围 2026-03-14 到 2026-03-21

计算机视觉 Vision

NavTrust: Benchmarking Trustworthiness for Embodied Navigation

There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performance under nominal conditions, overlooking the potential corruptions that arise in real-world settings. To address this gap, we present NavTrust, a unified benchmark that systematically corrupts input modalities, including RGB, depth, and instructions, in realistic scenarios and evaluates their impact on navigation performance. To our best knowledge, NavTrust is the first benchmark that exposes embodied navigation agents to diverse RGB-Depth corruptions and instruction variations in a unified framework. Our extensive evaluation of seven state-of-the-art approaches reveals substantial performance degradation under realistic corruptions, which highlights critical robustness gaps and provides a roadmap toward more trustworthy embodied navigation systems. Furthermore, we systematically evaluate four distinct mitigation strategies to enhance robustness against RGB-Depth and instructions corruptions. Our base models include Uni-NaVid and ETPNav. We deployed them on a real mobile robot and observed improved robustness to corruptions. The project website is: https://navtrust.github.io.

2026 · arXiv

近 7 天内更新,主题 Vision,综合评分 30.0

自然语言处理 NLP

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals, demonstrating remarkably high intelligence density with 20x fewer parameters. In contrast to Nemotron-Cascade 1, the key technical advancements are as follows. After SFT on a meticulously curated dataset, we substantially expand Cascade RL to cover a much broader spectrum of reasoning and agentic domains. Furthermore, we introduce multi-domain on-policy distillation from the strongest intermediate teacher models for each domain throughout the Cascade RL process, allowing us to efficiently recover benchmark regressions and sustain strong performance gains along the way. We release the collection of model checkpoint and training data.

2026 · arXiv

近 7 天内更新,主题 NLP,综合评分 30.0

自然语言处理 NLP

Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Language Navigation

Robots collaborating with humans must convert natural language goals into actionable, physically grounded decisions. For example, executing a command such as "go two meters to the right of the fridge" requires grounding semantic references, spatial relations, and metric constraints within a 3D scene. While recent vision language models (VLMs) demonstrate strong semantic grounding capabilities, they are not explicitly designed to reason about metric constraints in physically defined spaces. In this work, we empirically demonstrate that state-of-the-art VLM-based grounding approaches struggle with complex metric-semantic language queries. To address this limitation, we propose MAPG (Multi-Agent Probabilistic Grounding), an agentic framework that decomposes language queries into structured subcomponents and queries a VLM to ground each component. MAPG then probabilistically composes these grounded outputs to produce metrically consistent, actionable decisions in 3D space. We evaluate MAPG on the HM-EQA benchmark and show consistent performance improvements over strong baselines. Furthermore, we introduce a new benchmark, MAPG-Bench, specifically designed to evaluate metric-semantic goal grounding, addressing a gap in existing language grounding evaluations. We also present a real-world robot demonstration showing that MAPG transfers beyond simulation when a structured scene representation is available.

2026 · arXiv

近 7 天内更新,主题 NLP,综合评分 30.0

计算机视觉 Vision

DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction and Understanding

With the growing adoption of vision-language-action models and world models in autonomous driving systems, scalable image tokenization becomes crucial as the interface for the visual modality. However, most existing tokenizers are designed for monocular and 2D scenes, leading to inefficiency and inter-view inconsistency when applied to high-resolution multi-view driving scenes. To address this, we propose DriveTok, an efficient 3D driving scene tokenizer for unified multi-view reconstruction and understanding. DriveTok first obtains semantically rich visual features from vision foundation models and then transforms them into the scene tokens with 3D deformable cross-attention. For decoding, we employ a multi-view transformer to reconstruct multi-view features from the scene tokens and use multiple heads to obtain RGB, depth, and semantic reconstructions. We also add a 3D head directly on the scene tokens for 3D semantic occupancy prediction for better spatial awareness. With the multiple training objectives, DriveTok learns unified scene tokens that integrate semantic, geometric, and textural information for efficient multi-view tokenization. Extensive experiments on the widely used nuScenes dataset demonstrate that the scene tokens from DriveTok perform well on image reconstruction, semantic segmentation, depth prediction, and 3D occupancy prediction tasks.

2026 · arXiv

近 7 天内更新,主题 Vision,综合评分 29.7

通用主题 General

AlignMamba-2: Enhancing Multimodal Fusion and Sentiment Analysis with Modality-Aware Mamba

In the era of large-scale pre-trained models, effectively adapting general knowledge to specific affective computing tasks remains a challenge, particularly regarding computational efficiency and multimodal heterogeneity. While Transformer-based methods have excelled at modeling inter-modal dependencies, their quadratic computational complexity limits their use with long-sequence data. Mamba-based models have emerged as a computationally efficient alternative; however, their inherent sequential scanning mechanism struggles to capture the global, non-sequential relationships that are crucial for effective cross-modal alignment. To address these limitations, we propose \textbf{AlignMamba-2}, an effective and efficient framework for multimodal fusion and sentiment analysis. Our approach introduces a dual alignment strategy that regularizes the model using both Optimal Transport distance and Maximum Mean Discrepancy, promoting geometric and statistical consistency between modalities without incurring any inference-time overhead. More importantly, we design a Modality-Aware Mamba layer, which employs a Mixture-of-Experts architecture with modality-specific and modality-shared experts to explicitly handle data heterogeneity during the fusion process. Extensive experiments on four challenging benchmarks, including dynamic time-series (on the CMU-MOSI and CMU-MOSEI datasets) and static image-related tasks (on the NYU-Depth V2 and MVSA-Single datasets), demonstrate that AlignMamba-2 establishes a new state-of-the-art in both effectiveness and efficiency across diverse pattern recognition tasks, ranging from dynamic time-series analysis to static image-text classification.

2026 · arXiv

近 7 天内更新,主题 General,综合评分 29.5

通用主题 General

Accurate and Efficient Multi-Channel Time Series Forecasting via Sparse Attention Mechanism

The task of multi-channel time series forecasting is ubiquitous in numerous fields such as finance, supply chain management, and energy planning. It is critical to effectively capture complex dynamic dependencies within and between channels for accurate predictions. However, traditional method paid few attentions on learning the interaction among channels. This paper proposes Linear-Network (Li-Net), a novel architecture designed for multi-channel time series forecasting that captures the linear and non-linear dependencies among channels. Li-Net dynamically compresses representations across sequence and channel dimensions, processes the information through a configurable non-linear module and subsequently reconstructs the forecasts. Moreover, Li-Net integrates a sparse Top-K Softmax attention mechanism within a multi-scale projection framework to address these challenges. A core innovation is its ability to seamlessly incorporate and fuse multi-modal embeddings, guiding the sparse attention process to focus on the most informative time steps and feature channels. Through the experiment results on multiple real-world benchmark datasets demonstrate that Li-Net achieves competitive performance compared to state-of-the-art baseline methods. Furthermore, Li-Net provides a superior balance between prediction accuracy and computational burden, exhibiting significantly lower memory usage and faster inference times. Detailed ablation studies and parameter sensitivity analyses validate the effectiveness of each key component in our proposed architecture. Keywords: Multivariate Time Series Forecasting, Sparse Attention Mechanism, Multimodal Information Fusion, Non-linear relationship

2026 · arXiv

近 7 天内更新,主题 General,综合评分 29.4

关键词趋势

近 7 天 vs 前 23 天(总窗口 30 天)

关键词数 16
上升 8
稳定 3
回落 5
样本窗口 排序回退窗 · 84/276
主题簇 基础模型 5 · 应用与场景 4 · 评测与工具 2

可见 12 / 12

Markdown Preview

研究版摘要预览

当前筛选结果将生成在这里。

会基于当前筛选、摘要模板和摘要选项,同时生成研究卡 / 运营卡 / 工程卡三张图片;ZIP 包适合一次性发给团队。

Asset Center

模板资产中心

查看当前筛选下将生成的文件名、适用场景与 ZIP 打包文件,方便直接发给设计、运营或研发同学。

ZIP 包

radar-template-kit-0000-00-00.zip

研究卡

论文解读 / 周报正文 / 知识库封面

当前模板
radar-research-card-0000-00-00.png

适合强调结论结构、关键词信号和阅读延展。

运营卡

社群播报 / 热点海报 / 公众号配图

当前模板
radar-research-card-0000-00-00.png

适合高频传播、快速转发和封面型内容场景。

工程卡

系统复盘 / 部署同步 / 研发例会

当前模板
radar-research-card-0000-00-00.png

适合强调工程关注项、指标与判断依据。

Bundle Manifest

ZIP 清单预览

这里会展示当前筛选下将被打进 ZIP 包的文件清单,方便导出前快速确认。

radar-template-kit-0000-00-00.zip

研究卡

适合知识沉淀与深度解读

当前模板
radar-research-card-0000-00-00.png

运营卡

适合传播与热点封面

当前模板
radar-research-card-0000-00-00.png

工程卡

适合工程复盘与同步

当前模板
radar-research-card-0000-00-00.png

计算机视觉 Vision

应用与场景

近 7 天
前 23 天

39/90 上升 Δ 0.138 R 1.42

通用智能 General AI

研究前沿

近 7 天
前 23 天

29/131 回落 Δ -0.129 R 0.73

智能体 Agent

应用与场景

近 7 天
前 23 天

8/44 回落 Δ -0.064 R 0.60

多模态 Multimodal

应用与场景

近 7 天
前 23 天

12/23 上升 Δ 0.059 R 1.71

机器人 Robotics

应用与场景

近 7 天
前 23 天

8/9 上升 Δ 0.063 R 2.92

机器学习 Machine Learning

基础模型

近 7 天
前 23 天

33/103 上升 Δ 0.020 R 1.05

评测基准 Benchmark

评测与工具

近 7 天
前 23 天

16/41 上升 Δ 0.042 R 1.28

自然语言处理 NLP

基础模型

近 7 天
前 23 天

18/49 上升 Δ 0.037 R 1.21

推理优化 Inference

系统工程

近 7 天
前 23 天

12/50 回落 Δ -0.038 R 0.79

人机交互 HCI

评测与工具

近 7 天
前 23 天

1/11 回落 Δ -0.028 R 0.30

对齐与安全 Alignment

安全与治理

近 7 天
前 23 天

12/36 稳定 Δ 0.012 R 1.10

注意力机制 Attention

基础模型

近 7 天
前 23 天

10/30 稳定 Δ 0.010 R 1.10

站内论文列表

按关键词、年份和主题过滤,并支持排序浏览。

0 条结果