DetailRecon: Focusing on Detailed Regions for Online Monocular 3D Reconstruction

Learning-based online monocular 3D reconstruction has emerged with great potential recently. Most state-of-the-art methods focus on two key questions, namely 1) how to exploit accurate voxel features and 2) how to preserve detailed voxels in the sparsification process. However, 1) most methods adopt...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on multimedia Vol. 27; pp. 3266 - 3278
Main Authors	Chu, Fupeng, Cong, Yang, Wang, Yanmei, Chen, Ronghan
Format	Journal Article
Language	English
Published	IEEE 2025
Subjects	3D reconstruction 3D scene reconstruction Accuracy Feature extraction Geometry Image reconstruction Learning systems Legged locomotion online 3D reconstruction Representation learning Surface reconstruction Three-dimensional displays Transformers
Online Access	Get full text
ISSN	1520-9210 1941-0077
DOI	10.1109/TMM.2025.3535311

Cover

Abstract	Learning-based online monocular 3D reconstruction has emerged with great potential recently. Most state-of-the-art methods focus on two key questions, namely 1) how to exploit accurate voxel features and 2) how to preserve detailed voxels in the sparsification process. However, 1) most methods adopt the same receptive field to extract features for both informative and uninformative regions, which struggle to capture geometric details. Furthermore, 2) they mainly utilize a fixed threshold or a straightforward ray-based algorithm to discard voxels in the sparsification process. However, some detailed regions (especially thin regions) may be discarded incorrectly. To tackle these challenges, we present a novel method named DetailRecon to focus on detailed regions that contain more geometric information. Specifically, we first propose an Adaptive Hybrid Fusion (AHF) module and a Connectivity-Aware Sparsification (CAS) module for voxel feature learning and voxel sparsification, respectively. 1) The AHF receives multiple feature maps with different receptive fields as input, and adaptively adopts a smaller receptive field for regions with fine structures to exploit accurate geometric details. 2) The CAS updates the occupancy value of voxels based on the connected voxels within its neighbor space, which could expand the radiation range of reliable voxels in detailed regions and eventually reduce their probability of being discarded. Moreover, 3) we introduce a lightweight yet effective pipeline named Focus On Fine (FOF) to accelerate our DetailRecon. In addition, 4) we propose a Hierarchical Consistency Loss (HCL) to align multi-level volume features, which assists in exploring accurate volume features for recovering more details. Extensive experiments conducted on the ScanNet (V2) and 7-Scenes datasets demonstrate the superiority of our DetailRecon.
AbstractList	Learning-based online monocular 3D reconstruction has emerged with great potential recently. Most state-of-the-art methods focus on two key questions, namely 1) how to exploit accurate voxel features and 2) how to preserve detailed voxels in the sparsification process. However, 1) most methods adopt the same receptive field to extract features for both informative and uninformative regions, which struggle to capture geometric details. Furthermore, 2) they mainly utilize a fixed threshold or a straightforward ray-based algorithm to discard voxels in the sparsification process. However, some detailed regions (especially thin regions) may be discarded incorrectly. To tackle these challenges, we present a novel method named DetailRecon to focus on detailed regions that contain more geometric information. Specifically, we first propose an Adaptive Hybrid Fusion (AHF) module and a Connectivity-Aware Sparsification (CAS) module for voxel feature learning and voxel sparsification, respectively. 1) The AHF receives multiple feature maps with different receptive fields as input, and adaptively adopts a smaller receptive field for regions with fine structures to exploit accurate geometric details. 2) The CAS updates the occupancy value of voxels based on the connected voxels within its neighbor space, which could expand the radiation range of reliable voxels in detailed regions and eventually reduce their probability of being discarded. Moreover, 3) we introduce a lightweight yet effective pipeline named Focus On Fine (FOF) to accelerate our DetailRecon. In addition, 4) we propose a Hierarchical Consistency Loss (HCL) to align multi-level volume features, which assists in exploring accurate volume features for recovering more details. Extensive experiments conducted on the ScanNet (V2) and 7-Scenes datasets demonstrate the superiority of our DetailRecon.
Author	Wang, Yanmei Chen, Ronghan Chu, Fupeng Cong, Yang
Author_xml	– sequence: 1 givenname: Fupeng orcidid: 0000-0002-0164-5850 surname: Chu fullname: Chu, Fupeng email: fupengchu@gmail.com organization: State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China – sequence: 2 givenname: Yang orcidid: 0000-0002-5102-0189 surname: Cong fullname: Cong, Yang email: congyang81@gmail.com organization: School of Automation Science and Engineering, South China University of Technology, Guangzhou, China – sequence: 3 givenname: Yanmei orcidid: 0000-0002-1869-7665 surname: Wang fullname: Wang, Yanmei email: wangyanmei@sia.cn organization: State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China – sequence: 4 givenname: Ronghan orcidid: 0000-0001-6307-2923 surname: Chen fullname: Chen, Ronghan email: chenronghan@sia.cn organization: State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China
BookMark	eNpNkE9PwzAMxSM0JLbBnQOHfIEOO3-alRvaGCBtGprGuUpTdyoqCWq6A9-ewHbAPtiS33uyfhM28sETY7cIM0Qo7vebzUyA0DOpUyNesDEWCjMAY0Zp1wKyQiBcsUmMHwCoNJgxe1vSYNtuRy74B74K7hhbf-DB89OBar6jQxt85E3o-dZ3rSe-CT4pO9tzueR_3jj0Rzck3TW7bGwX6eY8p-x99bRfvGTr7fPr4nGdOYFmyJwzukCp69zKWtc2_SOxaEgKI6FSVlcqVwJcA4Wt8nkthHW2QcoVKTDVXE4ZnHJdH2LsqSm_-vbT9t8lQvlLpExEyl8i5ZlIstydLC0R_ZPPdSqQP2yEXms
CODEN	ITMUF8
Cites_doi	10.1109/CVPR.2014.196 10.1007/978-3-319-46487-9_31 10.1609/aaai.v37i2.25358 10.1109/CVPR.2008.4587671 10.1109/3DV.2018.00037 10.1109/ICCV.2019.00274 10.1145/3503250 10.1109/TII.2020.3016393 10.1109/CVPR.2011.5995693 10.1109/ICCV48922.2021.01578 10.1117/12.473938 10.1109/ACCESS.2021.3049548 10.1109/TMM.2020.3017886 10.1007/978-3-030-58571-6_25 10.1109/CVPR52729.2023.01661 10.1109/CVPR.2017.261 10.1109/3DV53792.2021.00042 10.1007/978-3-030-01237-3_47 10.1109/TMM.2023.3251697 10.1108/IR-05-2015-0110 10.1109/ICCV.2017.253 10.1109/3DV53792.2021.00079 10.1109/TMM.2021.3073265 10.1109/ICCV51070.2023.00338 10.20870/IJVR.2010.9.1.2761 10.1109/CVPR46437.2021.01507 10.1109/CVPR42600.2020.00724 10.5721/EuJRS20144723 10.1007/978-3-031-19827-4_1 10.1109/TMM.2018.2859034 10.1109/ICCV51070.2023.01627 10.1109/CVPR46437.2021.01534 10.1109/ICCV51070.2023.01689 10.1109/TMM.2024.3388929 10.1109/TIM.2020.3026719 10.1109/CVPR.2013.377 10.1109/ICCV51070.2023.01667 10.1109/ICRA40945.2020.9197388 10.1109/ROBOT.2003.1241726 10.1109/ICCV.2015.107 10.1007/s10489-022-03724-9 10.1109/CVPR.2019.00293
ContentType	Journal Article
DBID	97E RIA RIE AAYXX CITATION
DOI	10.1109/TMM.2025.3535311
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore (NTUSG) CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science
EISSN	1941-0077
EndPage	3278
ExternalDocumentID	10_1109_TMM_2025_3535311 10855550
Genre	orig-research
GrantInformation_xml	– fundername: National Science and Technology Major Project of the New Generation of Artificial Intelligence grantid: 2018AAA0102900 – fundername: National Natural Science Foundation of China grantid: 62225310; 62127807; 62133005 funderid: 10.13039/501100001809
GroupedDBID	-~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P PQQKQ RIA RIE RNS TN5 VH1 ZY4 AAYXX CITATION
ID	FETCH-LOGICAL-c217t-cc759135d6a3d5da014319fe32730b4a5b46420cf09ab68d22acaf1e64e407b83
IEDL.DBID	RIE
ISSN	1520-9210
IngestDate	Wed Oct 01 05:46:59 EDT 2025 Wed Jun 18 06:01:23 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c217t-cc759135d6a3d5da014319fe32730b4a5b46420cf09ab68d22acaf1e64e407b83
ORCID	0000-0002-0164-5850 0000-0002-5102-0189 0000-0002-1869-7665 0000-0001-6307-2923
PageCount	13
ParticipantIDs	crossref_primary_10_1109_TMM_2025_3535311 ieee_primary_10855550
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	20250000 2025-00-00
PublicationDateYYYYMMDD	2025-01-01
PublicationDate_xml	– year: 2025 text: 20250000
PublicationDecade	2020
PublicationTitle	IEEE transactions on multimedia
PublicationTitleAbbrev	TMM
PublicationYear	2025
Publisher	IEEE
Publisher_xml	– name: IEEE
References	ref13 ref35 ref12 ref34 ref15 ref37 ref14 ref31 ref30 ref11 ref33 ref10 ref32 ref2 ref1 ref17 ref39 ref16 ref19 Im (ref44) 2019 ref18 Chung (ref38) 2014 ref24 ref46 ref23 ref45 ref26 ref25 ref47 ref20 ref41 ref22 ref21 ref43 ref28 Bozic (ref8) 2021,; 34 ref27 ref29 ref7 ref9 ref4 ref3 Dosovitskiy (ref36) 2021 Eigen (ref42) 2014; 27 ref6 ref5 ref40
References_xml	– ident: ref20 doi: 10.1109/CVPR.2014.196 – ident: ref17 doi: 10.1007/978-3-319-46487-9_31 – ident: ref37 doi: 10.1609/aaai.v37i2.25358 – ident: ref19 doi: 10.1109/CVPR.2008.4587671 – ident: ref39 doi: 10.1109/3DV.2018.00037 – ident: ref43 doi: 10.1109/ICCV.2019.00274 – ident: ref47 doi: 10.1145/3503250 – ident: ref3 doi: 10.1109/TII.2020.3016393 – ident: ref18 doi: 10.1109/CVPR.2011.5995693 – volume-title: Proc. Int. Conf. Learn. Representations year: 2019 ident: ref44 article-title: Dpsnet: End-to-end deep plane sweep stereo – volume: 34 start-page: 1403 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2021, ident: ref8 article-title: Transformerfusion: Monocular RGB scene reconstruction using transformers – ident: ref30 doi: 10.1109/ICCV48922.2021.01578 – ident: ref40 doi: 10.1117/12.473938 – ident: ref25 doi: 10.1109/ACCESS.2021.3049548 – ident: ref1 doi: 10.1109/TMM.2020.3017886 – ident: ref9 doi: 10.1007/978-3-030-58571-6_25 – ident: ref7 doi: 10.1109/CVPR52729.2023.01661 – ident: ref12 doi: 10.1109/CVPR.2017.261 – volume: 27 start-page: 2366 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2014 ident: ref42 article-title: Depth map prediction from a single image using a multi-scale deep network – ident: ref10 doi: 10.1109/3DV53792.2021.00042 – ident: ref29 doi: 10.1007/978-3-030-01237-3_47 – ident: ref15 doi: 10.1109/TMM.2023.3251697 – ident: ref24 doi: 10.1108/IR-05-2015-0110 – ident: ref34 doi: 10.1109/ICCV.2017.253 – ident: ref31 doi: 10.1109/3DV53792.2021.00079 – ident: ref14 doi: 10.1109/TMM.2021.3073265 – ident: ref35 doi: 10.1109/ICCV51070.2023.00338 – ident: ref22 doi: 10.20870/IJVR.2010.9.1.2761 – ident: ref45 doi: 10.1109/CVPR46437.2021.01507 – ident: ref4 doi: 10.1109/CVPR42600.2020.00724 – ident: ref32 doi: 10.5721/EuJRS20144723 – ident: ref26 doi: 10.1007/978-3-031-19827-4_1 – ident: ref2 doi: 10.1109/TMM.2018.2859034 – ident: ref33 doi: 10.1109/ICCV51070.2023.01627 – ident: ref11 doi: 10.1109/CVPR46437.2021.01534 – ident: ref28 doi: 10.1109/ICCV51070.2023.01689 – volume-title: Proc. Int. Conf. Learn. Representations year: 2021 ident: ref36 article-title: An image is worth 16x16 words: Transformers for image recognition at scale – ident: ref23 doi: 10.1109/TMM.2024.3388929 – ident: ref5 doi: 10.1109/TIM.2020.3026719 – ident: ref13 doi: 10.1109/CVPR.2013.377 – ident: ref27 doi: 10.1109/ICCV51070.2023.01667 – volume-title: NIPS Workshop Deep Learn. year: 2014 ident: ref38 article-title: Empirical evaluation of gated recurrent neural networks on sequence modeling – ident: ref6 doi: 10.1109/ICRA40945.2020.9197388 – ident: ref41 doi: 10.1109/ROBOT.2003.1241726 – ident: ref21 doi: 10.1109/ICCV.2015.107 – ident: ref16 doi: 10.1007/s10489-022-03724-9 – ident: ref46 doi: 10.1109/CVPR.2019.00293
SSID	ssj0014507
Score	2.4348257
Snippet	Learning-based online monocular 3D reconstruction has emerged with great potential recently. Most state-of-the-art methods focus on two key questions, namely...
SourceID	crossref ieee
SourceType	Index Database Publisher
StartPage	3266
SubjectTerms	3D reconstruction 3D scene reconstruction Accuracy Feature extraction Geometry Image reconstruction Learning systems Legged locomotion online 3D reconstruction Representation learning Surface reconstruction Three-dimensional displays Transformers
Title	DetailRecon: Focusing on Detailed Regions for Online Monocular 3D Reconstruction
URI	https://ieeexplore.ieee.org/document/10855550
Volume	27
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1941-0077 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014507 issn: 1520-9210 databaseCode: RIE dateStart: 19990101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA_qSQ9O58T5RQ5ePKRr2qRtvIlzDGFDZIPdSpq8XoRWcLv41_uStmMKgu2ltE1J30v73u99EnKXxiIxwA0rizJlAmUmU5nQrBSmzJK4NNabLmbzZLoULyu5apPVfS4MAPjgMwjcoffl29psnKls5CLlcUOEvp9mSZOstXUZCOlzo1EehUwhkOl8kqEaLWYzRIKRDGKJO-c_ZNBOUxUvUyY9Mu9m04SSvAebdRGYr1-FGv893RNy3GqX9LFZDqdkD6o-6XWdG2j7IffJ0U4ZwjPyOvZxpA6JVg90gg92BgRaV7S5AJa-gYtb_qSo4tKmOinFv0Htg1hpPKZ-7LYW7YAsJ8-LpylrOy0wg5BkzYxJpeKxtImOrbTaFf3jqoQYlZuwEFoWAnFKaMpQ6SLJbBRpo0sOiQAEhEUWn5ODqq7ggtDCZIhRJNeRscIoUDKFlHPQwLWx2gzJfUf7_KMpqJF7IBKqHPmUOz7lLZ-GZOCounNfQ9DLP85fkUM3vLGQXJMDfGu4QZ1hXdz6tfINJPG-CQ
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA-iB_XgdE6cnzl48dCuaZO28SbOMXUdIhvsVtLk9SJ0gtvFv96XtBtTEGwvJW1DeC_te7_3SchNEvFYA9NeWZSJx1FmejLlyiu5LtM4KrVxpotsHA-n_HkmZk2yusuFAQAXfAa-vXS-fDPXS2sq69lIeTwQoe8Izrmo07XWTgMuXHY0SqTAkwhlVl7JQPYmWYZYMBR-JPBk7IcU2mir4qTKoEXGq_XUwSTv_nJR-PrrV6nGfy_4kBw0-iW9rzfEEdmCqk1aq94NtPmU22R_oxDhMXntu0hSi0WrOzrAia0Jgc4rWt8AQ9_ARi5_UlRyaV2flOL_YO7CWGnUp-7ddTXaDpkOHicPQ6_pteBpBCULT-tESBYJE6vICKNs2T8mS4hQvQkKrkTBEakEugykKuLUhKHSqmQQc0BIWKTRCdmu5hWcElroFFGKYCrUhmsJUiSQMAYKmNJG6S65XdE-_6hLauQOigQyRz7llk95w6cu6ViqbjxXE_Tsj_FrsjucZKN89DR-OSd7dqraXnJBtpECcIkaxKK4cvvmG0euwVY
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DetailRecon%3A+Focusing+on+Detailed+Regions+for+Online+Monocular+3D+Reconstruction&rft.jtitle=IEEE+transactions+on+multimedia&rft.au=Chu%2C+Fupeng&rft.au=Cong%2C+Yang&rft.au=Wang%2C+Yanmei&rft.au=Chen%2C+Ronghan&rft.date=2025&rft.pub=IEEE&rft.issn=1520-9210&rft.volume=27&rft.spage=3266&rft.epage=3278&rft_id=info:doi/10.1109%2FTMM.2025.3535311&rft.externalDocID=10855550
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-9210&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-9210&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-9210&client=summon