arXiv 论文速递

VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Authors: Lin Li, Zehuan Huang, Haoran Feng, Gengxiong Zhuang, Rui Chen, Chunchao Guo, Lu Sheng

First: 2025-08-26T17:59:47+00:00 · Latest: 2025-08-26T17:59:47+00:00

Comments: Project page: https://huanngzh.github.io/VoxHammer-Page/

Abstract

3D local editing of specified regions is crucial for game industry and robot interaction. Recent methods typically edit rendered multi-view images and then reconstruct 3D models, but they face challenges in precisely preserving unedited regions and overall coherence. Inspired by structured 3D generative models, we propose VoxHammer, a novel training-free approach that performs precise and coherent editing in 3D latent space. Given a 3D model, VoxHammer first predicts its inversion trajectory and obtains its inverted latents and key-value tokens at each timestep. Subsequently, in the denoising and editing phase, we replace the denoising features of preserved regions with the corresponding inverted latents and cached key-value tokens. By retaining these contextual features, this approach ensures consistent reconstruction of preserved areas and coherent integration of edited parts. To evaluate the consistency of preserved regions, we constructed Edit3D-Bench, a human-annotated dataset comprising hundreds of samples, each with carefully labeled 3D editing regions. Experiments demonstrate that VoxHammer significantly outperforms existing methods in terms of both 3D consistency of preserved regions and overall quality. Our method holds promise for synthesizing high-quality edited paired data, thereby laying the data foundation for in-context 3D generation. See our project page at https://huanngzh.github.io/VoxHammer-Page/.

Summary / 总结

3D local editing of specified regions is crucial for game industry and robot interaction.

Articulate3D: Zero-Shot Text-Driven 3D Object Posing

Authors: Oishi Deb, Anjun Hu, Ashkan Khakzar, Philip Torr, Christian Rupprecht

First: 2025-08-26T17:59:17+00:00 · Latest: 2025-08-26T17:59:17+00:00

Comments: Project page:https://odeb1.github.io/articulate3d_page_deb/

Abs · PDF · Project1

Abstract

We propose a training-free method, Articulate3D, to pose a 3D asset through language control. Despite advances in vision and language models, this task remains surprisingly challenging. To achieve this goal, we decompose the problem into two steps. We modify a powerful image-generator to create target images conditioned on the input image and a text instruction. We then align the mesh to the target images through a multi-view pose optimisation step. In detail, we introduce a self-attention rewiring mechanism (RSActrl) that decouples the source structure from pose within an image generative model, allowing it to maintain a consistent structure across varying poses. We observed that differentiable rendering is an unreliable signal for articulation optimisation; instead, we use keypoints to establish correspondences between input and target images. The effectiveness of Articulate3D is demonstrated across a diverse range of 3D objects and free-form text prompts, successfully manipulating poses while maintaining the original identity of the mesh. Quantitative evaluations and a comparative user study, in which our method was preferred over 85\% of the time, confirm its superiority over existing approaches. Project page:https://odeb1.github.io/articulate3d_page_deb/

Summary / 总结

We propose a training-free method, Articulate3D, to pose a 3D asset through language control.

Route-and-Execute: Auditable Model-Card Matching and Specialty-Level Deployment

Authors: Shayan Vassef, Soorya Ram Shimegekar, Abhay Goyal, Koustuv Saha, Pi Zonooz, Navin Kumar

First: 2025-08-22T23:34:37+00:00 · Latest: 2025-08-26T17:13:21+00:00