arXiv 论文速递

2025-08-25 17:09
Snapshot: 20250825_1709
DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing Scenes
Authors: Jiajun Jiang, Yiming Zhu, Zirui Wu, Jie Song
First: 2025-06-02T17:59:10+00:00 · Latest: 2025-08-13T07:21:25+00:00
Comments: 14 pages, 14 figures. Code: https://github.com/Eku127/DualMap Project page: https://eku127.github.io/DualMap/
Abstract
We introduce DualMap, an online open-vocabulary mapping system that enables robots to understand and navigate dynamically changing environments through natural language queries. Designed for efficient semantic mapping and adaptability to changing environments, DualMap meets the essential requirements for real-world robot navigation applications. Our proposed hybrid segmentation frontend and object-level status check eliminate the costly 3D object merging required by prior methods, enabling efficient online scene mapping. The dual-map representation combines a global abstract map for high-level candidate selection with a local concrete map for precise goal-reaching, effectively managing and updating dynamic changes in the environment. Through extensive experiments in both simulation and real-world scenarios, we demonstrate state-of-the-art performance in 3D open-vocabulary segmentation, efficient scene mapping, and online language-guided navigation.Project page: https://eku127.github.io/DualMap/
Summary / 总结
DualMap is an online open-vocabulary mapping system that allows robots to navigate dynamically changing environments using natural language. It uses a hybrid segmentation frontend and object-level status check to avoid costly 3D object merging, enabling efficient online scene mapping. The system employs a dual-map representation combining a global abstract map for high-level candidate selection and a local concrete map for precise goal-reaching, demonstrating state-of-the-art performance in 3D open-vocabulary segmentation, efficient scene mapping, and online language-guided navigation through extensive experiments in both simulation and real-world scenarios.
DualMap 是一种在线开放词汇映射系统,使机器人能够通过自然语言在动态变化的环境中导航。它使用混合分割前端和对象级状态检查来高效地映射场景,无需进行昂贵的 3D 对象合并。该系统采用双重地图表示,结合全局抽象地图进行高层次候选选择和局部具体地图进行精确目标导航,有效管理动态变化。实验表明,其在 3D 开放词汇分割、高效场景映射和在线语言引导导航方面表现出最先进的性能。
CitySeg: A 3D Open Vocabulary Semantic Segmentation Foundation Model in City-scale Scenarios
Authors: Jialei Xu, Zizhuang Wei, Weikang You, Linyun Li, Weijian Sun
First: 2025-08-13T03:55:56+00:00 · Latest: 2025-08-13T03:55:56+00:00
Abstract
Semantic segmentation of city-scale point clouds is a critical technology for Unmanned Aerial Vehicle (UAV) perception systems, enabling the classification of 3D points without relying on any visual information to achieve comprehensive 3D understanding. However, existing models are frequently constrained by the limited scale of 3D data and the domain gap between datasets, which lead to reduced generalization capability. To address these challenges, we propose CitySeg, a foundation model for city-scale point cloud semantic segmentation that incorporates text modality to achieve open vocabulary segmentation and zero-shot inference. Specifically, in order to mitigate the issue of non-uniform data distribution across multiple domains, we customize the data preprocessing rules, and propose a local-global cross-attention network to enhance the perception capabilities of point networks in UAV scenarios. To resolve semantic label discrepancies across datasets, we introduce a hierarchical classification strategy. A hierarchical graph established according to the data annotation rules consolidates the data labels, and the graph encoder is used to model the hierarchical relationships between categories. In addition, we propose a two-stage training strategy and employ hinge loss to increase the feature separability of subcategories. Experimental results demonstrate that the proposed CitySeg achieves state-of-the-art (SOTA) performance on nine closed-set benchmarks, significantly outperforming existing approaches. Moreover, for the first time, CitySeg enables zero-shot generalization in city-scale point cloud scenarios without relying on visual information.
Summary / 总结
CitySeg is a foundation model for city-scale point cloud semantic segmentation that incorporates text modality to achieve open vocabulary segmentation and zero-shot inference. It addresses the challenges of limited data scale and domain gap by customizing data preprocessing rules and proposing a local-global cross-attention network. The model introduces a hierarchical classification strategy and a two-stage training strategy, achieving state-of-the-art performance on nine closed-set benchmarks and enabling zero-shot generalization in city-scale scenarios without visual information.
CitySeg 是一种用于城市规模点云语义分割的基础模型,通过引入文本模态实现开放词汇分割和零样本推理。该模型通过定制数据预处理规则和提出局部-全局交叉注意力网络来解决数据规模有限和领域差异的问题。此外,模型引入了层次分类策略和两阶段训练策略,实现了在九个封闭集基准上的最先进性能,并首次在城市规模点云场景中实现了无需视觉信息的零样本泛化。
ReferSplat: Referring Segmentation in 3D Gaussian Splatting
Authors: Shuting He, Guangquan Jie, Changshuo Wang, Yun Zhou, Shuming Hu, Guanbin Li, Henghui Ding
Venue: ICML 2025 Oral
First: 2025-08-11T17:59:30+00:00 · Latest: 2025-08-11T17:59:30+00:00
Comments: ICML 2025 Oral, Code: https://github.com/heshuting555/ReferSplat
Abstract
We introduce Referring 3D Gaussian Splatting Segmentation (R3DGS), a new task that aims to segment target objects in a 3D Gaussian scene based on natural language descriptions, which often contain spatial relationships or object attributes. This task requires the model to identify newly described objects that may be occluded or not directly visible in a novel view, posing a significant challenge for 3D multi-modal understanding. Developing this capability is crucial for advancing embodied AI. To support research in this area, we construct the first R3DGS dataset, Ref-LERF. Our analysis reveals that 3D multi-modal understanding and spatial relationship modeling are key challenges for R3DGS. To address these challenges, we propose ReferSplat, a framework that explicitly models 3D Gaussian points with natural language expressions in a spatially aware paradigm. ReferSplat achieves state-of-the-art performance on both the newly proposed R3DGS task and 3D open-vocabulary segmentation benchmarks. Dataset and code are available at https://github.com/heshuting555/ReferSplat.
中文标题/摘要
标题:ReferSplat: 3D 高斯点云分割中的引用分割
我们引入了基于自然语言描述的3D 高斯点云分割 (R3DGS) 新任务,该任务旨在根据自然语言描述对3D 高斯场景中的目标物体进行分割,这些描述通常包含空间关系或物体属性。该任务要求模型识别新描述的可能被遮挡或在新视角中不可见的物体,这为3D 多模态理解带来了重大挑战。开发这种能力对于推进具身人工智能至关重要。为了支持该领域的研究,我们构建了第一个R3DGS数据集Ref-LERF。我们的分析表明,3D 多模态理解和空间关系建模是R3DGS的关键挑战。为了解决这些挑战,我们提出了ReferSplat框架,该框架在空间感知范式中使用自然语言表达明确建模3D 高斯点。ReferSplat在新提出的R3DGS任务和3D 开放词汇分割基准测试中均实现了最先进的性能。数据集和代码可在https://github.com/heshuting555/ReferSplat获取。
Summary / 总结
The paper introduces R3DGS, a task for segmenting target objects in 3D Gaussian scenes based on natural language descriptions, which often include spatial relationships or object attributes. The authors propose ReferSplat, a framework that models 3D Gaussian points with natural language expressions, achieving state-of-the-art performance on both the new R3DGS task and 3D open-vocabulary segmentation benchmarks. The work also includes the first R3DGS dataset, Ref-LERF, to support research in 3D multi-modal understanding and spatial relationship modeling.
论文介绍了R3DGS任务,该任务基于自然语言描述(通常包含空间关系或物体属性)对3D高斯场景中的目标物体进行分割。作者提出了ReferSplat框架,该框架使用自然语言表达来建模3D高斯点,实现了在新提出的R3DGS任务和3D开放词汇分割基准上的最佳性能。此外,还提供了第一个R3DGS数据集Ref-LERF,以支持3D多模态理解和空间关系建模的研究。
History