DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing Scenes
Authors: Jiajun Jiang, Yiming Zhu, Zirui Wu, Jie Song
First: 2025-06-02T17:59:10+00:00 · Latest: 2025-08-13T07:21:25+00:00
Comments: 14 pages, 14 figures. Code: https://github.com/Eku127/DualMap Project
page: https://eku127.github.io/DualMap/
Abstract
We introduce DualMap, an online open-vocabulary mapping system that enables
robots to understand and navigate dynamically changing environments through
natural language queries. Designed for efficient semantic mapping and
adaptability to changing environments, DualMap meets the essential requirements
for real-world robot navigation applications. Our proposed hybrid segmentation
frontend and object-level status check eliminate the costly 3D object merging
required by prior methods, enabling efficient online scene mapping. The
dual-map representation combines a global abstract map for high-level candidate
selection with a local concrete map for precise goal-reaching, effectively
managing and updating dynamic changes in the environment. Through extensive
experiments in both simulation and real-world scenarios, we demonstrate
state-of-the-art performance in 3D open-vocabulary segmentation, efficient
scene mapping, and online language-guided navigation.Project page:
https://eku127.github.io/DualMap/
Summary / 总结
DualMap is an online open-vocabulary mapping system that allows robots to navigate dynamically changing environments using natural language. It uses a hybrid segmentation frontend and object-level status check to avoid costly 3D object merging, enabling efficient online scene mapping. The system employs a dual-map representation combining a global abstract map for high-level candidate selection and a local concrete map for precise goal-reaching, demonstrating state-of-the-art performance in 3D open-vocabulary segmentation, efficient scene mapping, and online language-guided navigation through extensive experiments in both simulation and real-world scenarios.
DualMap 是一种在线开放词汇映射系统,使机器人能够通过自然语言在动态变化的环境中导航。它使用混合分割前端和对象级状态检查来高效地映射场景,无需进行昂贵的 3D 对象合并。该系统采用双重地图表示,结合全局抽象地图进行高层次候选选择和局部具体地图进行精确目标导航,有效管理动态变化。实验表明,其在 3D 开放词汇分割、高效场景映射和在线语言引导导航方面表现出最先进的性能。
CitySeg: A 3D Open Vocabulary Semantic Segmentation Foundation Model in City-scale Scenarios
Authors: Jialei Xu, Zizhuang Wei, Weikang You, Linyun Li, Weijian Sun
First: 2025-08-13T03:55:56+00:00 · Latest: 2025-08-13T03:55:56+00:00
Abstract
Semantic segmentation of city-scale point clouds is a critical technology for
Unmanned Aerial Vehicle (UAV) perception systems, enabling the classification
of 3D points without relying on any visual information to achieve comprehensive
3D understanding. However, existing models are frequently constrained by the
limited scale of 3D data and the domain gap between datasets, which lead to
reduced generalization capability. To address these challenges, we propose
CitySeg, a foundation model for city-scale point cloud semantic segmentation
that incorporates text modality to achieve open vocabulary segmentation and
zero-shot inference. Specifically, in order to mitigate the issue of
non-uniform data distribution across multiple domains, we customize the data
preprocessing rules, and propose a local-global cross-attention network to
enhance the perception capabilities of point networks in UAV scenarios. To
resolve semantic label discrepancies across datasets, we introduce a
hierarchical classification strategy. A hierarchical graph established
according to the data annotation rules consolidates the data labels, and the
graph encoder is used to model the hierarchical relationships between
categories. In addition, we propose a two-stage training strategy and employ
hinge loss to increase the feature separability of subcategories. Experimental
results demonstrate that the proposed CitySeg achieves state-of-the-art (SOTA)
performance on nine closed-set benchmarks, significantly outperforming existing
approaches. Moreover, for the first time, CitySeg enables zero-shot
generalization in city-scale point cloud scenarios without relying on visual
information.
Summary / 总结
CitySeg is a foundation model for city-scale point cloud semantic segmentation that incorporates text modality to achieve open vocabulary segmentation and zero-shot inference. It addresses the challenges of limited data scale and domain gap by customizing data preprocessing rules and proposing a local-global cross-attention network. The model introduces a hierarchical classification strategy and a two-stage training strategy, achieving state-of-the-art performance on nine closed-set benchmarks and enabling zero-shot generalization in city-scale scenarios without visual information.
CitySeg 是一种用于城市规模点云语义分割的基础模型,通过引入文本模态实现开放词汇分割和零样本推理。该模型通过定制数据预处理规则和提出局部-全局交叉注意力网络来解决数据规模有限和领域差异的问题。此外,模型引入了层次分类策略和两阶段训练策略,实现了在九个封闭集基准上的最先进性能,并首次在城市规模点云场景中实现了无需视觉信息的零样本泛化。
ReferSplat: Referring Segmentation in 3D Gaussian Splatting
Authors: Shuting He, Guangquan Jie, Changshuo Wang, Yun Zhou, Shuming Hu, Guanbin Li, Henghui Ding
Venue: ICML 2025 Oral
First: 2025-08-11T17:59:30+00:00 · Latest: 2025-08-11T17:59:30+00:00
Comments: ICML 2025 Oral, Code: https://github.com/heshuting555/ReferSplat
Abstract
We introduce Referring 3D Gaussian Splatting Segmentation (R3DGS), a new task
that aims to segment target objects in a 3D Gaussian scene based on natural
language descriptions, which often contain spatial relationships or object
attributes. This task requires the model to identify newly described objects
that may be occluded or not directly visible in a novel view, posing a
significant challenge for 3D multi-modal understanding. Developing this
capability is crucial for advancing embodied AI. To support research in this
area, we construct the first R3DGS dataset, Ref-LERF. Our analysis reveals that
3D multi-modal understanding and spatial relationship modeling are key
challenges for R3DGS. To address these challenges, we propose ReferSplat, a
framework that explicitly models 3D Gaussian points with natural language
expressions in a spatially aware paradigm. ReferSplat achieves state-of-the-art
performance on both the newly proposed R3DGS task and 3D open-vocabulary
segmentation benchmarks. Dataset and code are available at
https://github.com/heshuting555/ReferSplat.
中文标题/摘要
标题:ReferSplat: 3D 高斯点云分割中的引用分割
我们引入了基于自然语言描述的3D 高斯点云分割 (R3DGS) 新任务,该任务旨在根据自然语言描述对3D 高斯场景中的目标物体进行分割,这些描述通常包含空间关系或物体属性。该任务要求模型识别新描述的可能被遮挡或在新视角中不可见的物体,这为3D 多模态理解带来了重大挑战。开发这种能力对于推进具身人工智能至关重要。为了支持该领域的研究,我们构建了第一个R3DGS数据集Ref-LERF。我们的分析表明,3D 多模态理解和空间关系建模是R3DGS的关键挑战。为了解决这些挑战,我们提出了ReferSplat框架,该框架在空间感知范式中使用自然语言表达明确建模3D 高斯点。ReferSplat在新提出的R3DGS任务和3D 开放词汇分割基准测试中均实现了最先进的性能。数据集和代码可在https://github.com/heshuting555/ReferSplat获取。
Summary / 总结
The paper introduces R3DGS, a task for segmenting target objects in 3D Gaussian scenes based on natural language descriptions, which often include spatial relationships or object attributes. The authors propose ReferSplat, a framework that models 3D Gaussian points with natural language expressions, achieving state-of-the-art performance on both the new R3DGS task and 3D open-vocabulary segmentation benchmarks. The work also includes the first R3DGS dataset, Ref-LERF, to support research in 3D multi-modal understanding and spatial relationship modeling.
论文介绍了R3DGS任务,该任务基于自然语言描述(通常包含空间关系或物体属性)对3D高斯场景中的目标物体进行分割。作者提出了ReferSplat框架,该框架使用自然语言表达来建模3D高斯点,实现了在新提出的R3DGS任务和3D开放词汇分割基准上的最佳性能。此外,还提供了第一个R3DGS数据集Ref-LERF,以支持3D多模态理解和空间关系建模的研究。