DetailRecon: Focusing on Detailed Regions for Online Monocular 3D Reconstruction

Learning-based online monocular 3D reconstruction has emerged with great potential recently. Most state-of-the-art methods focus on two key questions, namely 1) how to exploit accurate voxel features and 2) how to preserve detailed voxels in the sparsification process. However, 1) most methods adopt...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on multimedia Vol. 27; pp. 3266 - 3278
Main Authors	Chu, Fupeng, Cong, Yang, Wang, Yanmei, Chen, Ronghan
Format	Journal Article
Language	English
Published	IEEE 2025
Subjects	3D reconstruction 3D scene reconstruction Accuracy Feature extraction Geometry Image reconstruction Learning systems Legged locomotion online 3D reconstruction Representation learning Surface reconstruction Three-dimensional displays Transformers
Online Access	Get full text
ISSN	1520-9210 1941-0077
DOI	10.1109/TMM.2025.3535311

Cover

More Information
Summary:	Learning-based online monocular 3D reconstruction has emerged with great potential recently. Most state-of-the-art methods focus on two key questions, namely 1) how to exploit accurate voxel features and 2) how to preserve detailed voxels in the sparsification process. However, 1) most methods adopt the same receptive field to extract features for both informative and uninformative regions, which struggle to capture geometric details. Furthermore, 2) they mainly utilize a fixed threshold or a straightforward ray-based algorithm to discard voxels in the sparsification process. However, some detailed regions (especially thin regions) may be discarded incorrectly. To tackle these challenges, we present a novel method named DetailRecon to focus on detailed regions that contain more geometric information. Specifically, we first propose an Adaptive Hybrid Fusion (AHF) module and a Connectivity-Aware Sparsification (CAS) module for voxel feature learning and voxel sparsification, respectively. 1) The AHF receives multiple feature maps with different receptive fields as input, and adaptively adopts a smaller receptive field for regions with fine structures to exploit accurate geometric details. 2) The CAS updates the occupancy value of voxels based on the connected voxels within its neighbor space, which could expand the radiation range of reliable voxels in detailed regions and eventually reduce their probability of being discarded. Moreover, 3) we introduce a lightweight yet effective pipeline named Focus On Fine (FOF) to accelerate our DetailRecon. In addition, 4) we propose a Hierarchical Consistency Loss (HCL) to align multi-level volume features, which assists in exploring accurate volume features for recovering more details. Extensive experiments conducted on the ScanNet (V2) and 7-Scenes datasets demonstrate the superiority of our DetailRecon.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2025.3535311