DetailRecon: Focusing on Detailed Regions for Online Monocular 3D Reconstruction

Learning-based online monocular 3D reconstruction has emerged with great potential recently. Most state-of-the-art methods focus on two key questions, namely 1) how to exploit accurate voxel features and 2) how to preserve detailed voxels in the sparsification process. However, 1) most methods adopt...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on multimedia Vol. 27; pp. 3266 - 3278
Main Authors Chu, Fupeng, Cong, Yang, Wang, Yanmei, Chen, Ronghan
Format Journal Article
LanguageEnglish
Published IEEE 2025
Subjects
Online AccessGet full text
ISSN1520-9210
1941-0077
DOI10.1109/TMM.2025.3535311

Cover

Abstract Learning-based online monocular 3D reconstruction has emerged with great potential recently. Most state-of-the-art methods focus on two key questions, namely 1) how to exploit accurate voxel features and 2) how to preserve detailed voxels in the sparsification process. However, 1) most methods adopt the same receptive field to extract features for both informative and uninformative regions, which struggle to capture geometric details. Furthermore, 2) they mainly utilize a fixed threshold or a straightforward ray-based algorithm to discard voxels in the sparsification process. However, some detailed regions (especially thin regions) may be discarded incorrectly. To tackle these challenges, we present a novel method named DetailRecon to focus on detailed regions that contain more geometric information. Specifically, we first propose an Adaptive Hybrid Fusion (AHF) module and a Connectivity-Aware Sparsification (CAS) module for voxel feature learning and voxel sparsification, respectively. 1) The AHF receives multiple feature maps with different receptive fields as input, and adaptively adopts a smaller receptive field for regions with fine structures to exploit accurate geometric details. 2) The CAS updates the occupancy value of voxels based on the connected voxels within its neighbor space, which could expand the radiation range of reliable voxels in detailed regions and eventually reduce their probability of being discarded. Moreover, 3) we introduce a lightweight yet effective pipeline named Focus On Fine (FOF) to accelerate our DetailRecon. In addition, 4) we propose a Hierarchical Consistency Loss (HCL) to align multi-level volume features, which assists in exploring accurate volume features for recovering more details. Extensive experiments conducted on the ScanNet (V2) and 7-Scenes datasets demonstrate the superiority of our DetailRecon.
AbstractList Learning-based online monocular 3D reconstruction has emerged with great potential recently. Most state-of-the-art methods focus on two key questions, namely 1) how to exploit accurate voxel features and 2) how to preserve detailed voxels in the sparsification process. However, 1) most methods adopt the same receptive field to extract features for both informative and uninformative regions, which struggle to capture geometric details. Furthermore, 2) they mainly utilize a fixed threshold or a straightforward ray-based algorithm to discard voxels in the sparsification process. However, some detailed regions (especially thin regions) may be discarded incorrectly. To tackle these challenges, we present a novel method named DetailRecon to focus on detailed regions that contain more geometric information. Specifically, we first propose an Adaptive Hybrid Fusion (AHF) module and a Connectivity-Aware Sparsification (CAS) module for voxel feature learning and voxel sparsification, respectively. 1) The AHF receives multiple feature maps with different receptive fields as input, and adaptively adopts a smaller receptive field for regions with fine structures to exploit accurate geometric details. 2) The CAS updates the occupancy value of voxels based on the connected voxels within its neighbor space, which could expand the radiation range of reliable voxels in detailed regions and eventually reduce their probability of being discarded. Moreover, 3) we introduce a lightweight yet effective pipeline named Focus On Fine (FOF) to accelerate our DetailRecon. In addition, 4) we propose a Hierarchical Consistency Loss (HCL) to align multi-level volume features, which assists in exploring accurate volume features for recovering more details. Extensive experiments conducted on the ScanNet (V2) and 7-Scenes datasets demonstrate the superiority of our DetailRecon.
Author Wang, Yanmei
Chen, Ronghan
Chu, Fupeng
Cong, Yang
Author_xml – sequence: 1
  givenname: Fupeng
  orcidid: 0000-0002-0164-5850
  surname: Chu
  fullname: Chu, Fupeng
  email: fupengchu@gmail.com
  organization: State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China
– sequence: 2
  givenname: Yang
  orcidid: 0000-0002-5102-0189
  surname: Cong
  fullname: Cong, Yang
  email: congyang81@gmail.com
  organization: School of Automation Science and Engineering, South China University of Technology, Guangzhou, China
– sequence: 3
  givenname: Yanmei
  orcidid: 0000-0002-1869-7665
  surname: Wang
  fullname: Wang, Yanmei
  email: wangyanmei@sia.cn
  organization: State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China
– sequence: 4
  givenname: Ronghan
  orcidid: 0000-0001-6307-2923
  surname: Chen
  fullname: Chen, Ronghan
  email: chenronghan@sia.cn
  organization: State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China
BookMark eNpNkE9PwzAMxSM0JLbBnQOHfIEOO3-alRvaGCBtGprGuUpTdyoqCWq6A9-ewHbAPtiS33uyfhM28sETY7cIM0Qo7vebzUyA0DOpUyNesDEWCjMAY0Zp1wKyQiBcsUmMHwCoNJgxe1vSYNtuRy74B74K7hhbf-DB89OBar6jQxt85E3o-dZ3rSe-CT4pO9tzueR_3jj0Rzck3TW7bGwX6eY8p-x99bRfvGTr7fPr4nGdOYFmyJwzukCp69zKWtc2_SOxaEgKI6FSVlcqVwJcA4Wt8nkthHW2QcoVKTDVXE4ZnHJdH2LsqSm_-vbT9t8lQvlLpExEyl8i5ZlIstydLC0R_ZPPdSqQP2yEXms
CODEN ITMUF8
Cites_doi 10.1109/CVPR.2014.196
10.1007/978-3-319-46487-9_31
10.1609/aaai.v37i2.25358
10.1109/CVPR.2008.4587671
10.1109/3DV.2018.00037
10.1109/ICCV.2019.00274
10.1145/3503250
10.1109/TII.2020.3016393
10.1109/CVPR.2011.5995693
10.1109/ICCV48922.2021.01578
10.1117/12.473938
10.1109/ACCESS.2021.3049548
10.1109/TMM.2020.3017886
10.1007/978-3-030-58571-6_25
10.1109/CVPR52729.2023.01661
10.1109/CVPR.2017.261
10.1109/3DV53792.2021.00042
10.1007/978-3-030-01237-3_47
10.1109/TMM.2023.3251697
10.1108/IR-05-2015-0110
10.1109/ICCV.2017.253
10.1109/3DV53792.2021.00079
10.1109/TMM.2021.3073265
10.1109/ICCV51070.2023.00338
10.20870/IJVR.2010.9.1.2761
10.1109/CVPR46437.2021.01507
10.1109/CVPR42600.2020.00724
10.5721/EuJRS20144723
10.1007/978-3-031-19827-4_1
10.1109/TMM.2018.2859034
10.1109/ICCV51070.2023.01627
10.1109/CVPR46437.2021.01534
10.1109/ICCV51070.2023.01689
10.1109/TMM.2024.3388929
10.1109/TIM.2020.3026719
10.1109/CVPR.2013.377
10.1109/ICCV51070.2023.01667
10.1109/ICRA40945.2020.9197388
10.1109/ROBOT.2003.1241726
10.1109/ICCV.2015.107
10.1007/s10489-022-03724-9
10.1109/CVPR.2019.00293
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TMM.2025.3535311
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Xplore (NTUSG)
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1941-0077
EndPage 3278
ExternalDocumentID 10_1109_TMM_2025_3535311
10855550
Genre orig-research
GrantInformation_xml – fundername: National Science and Technology Major Project of the New Generation of Artificial Intelligence
  grantid: 2018AAA0102900
– fundername: National Natural Science Foundation of China
  grantid: 62225310; 62127807; 62133005
  funderid: 10.13039/501100001809
GroupedDBID -~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
TN5
VH1
ZY4
AAYXX
CITATION
ID FETCH-LOGICAL-c217t-cc759135d6a3d5da014319fe32730b4a5b46420cf09ab68d22acaf1e64e407b83
IEDL.DBID RIE
ISSN 1520-9210
IngestDate Wed Oct 01 05:46:59 EDT 2025
Wed Jun 18 06:01:23 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c217t-cc759135d6a3d5da014319fe32730b4a5b46420cf09ab68d22acaf1e64e407b83
ORCID 0000-0002-0164-5850
0000-0002-5102-0189
0000-0002-1869-7665
0000-0001-6307-2923
PageCount 13
ParticipantIDs crossref_primary_10_1109_TMM_2025_3535311
ieee_primary_10855550
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20250000
2025-00-00
PublicationDateYYYYMMDD 2025-01-01
PublicationDate_xml – year: 2025
  text: 20250000
PublicationDecade 2020
PublicationTitle IEEE transactions on multimedia
PublicationTitleAbbrev TMM
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
References ref13
ref35
ref12
ref34
ref15
ref37
ref14
ref31
ref30
ref11
ref33
ref10
ref32
ref2
ref1
ref17
ref39
ref16
ref19
Im (ref44) 2019
ref18
Chung (ref38) 2014
ref24
ref46
ref23
ref45
ref26
ref25
ref47
ref20
ref41
ref22
ref21
ref43
ref28
Bozic (ref8) 2021,; 34
ref27
ref29
ref7
ref9
ref4
ref3
Dosovitskiy (ref36) 2021
Eigen (ref42) 2014; 27
ref6
ref5
ref40
References_xml – ident: ref20
  doi: 10.1109/CVPR.2014.196
– ident: ref17
  doi: 10.1007/978-3-319-46487-9_31
– ident: ref37
  doi: 10.1609/aaai.v37i2.25358
– ident: ref19
  doi: 10.1109/CVPR.2008.4587671
– ident: ref39
  doi: 10.1109/3DV.2018.00037
– ident: ref43
  doi: 10.1109/ICCV.2019.00274
– ident: ref47
  doi: 10.1145/3503250
– ident: ref3
  doi: 10.1109/TII.2020.3016393
– ident: ref18
  doi: 10.1109/CVPR.2011.5995693
– volume-title: Proc. Int. Conf. Learn. Representations
  year: 2019
  ident: ref44
  article-title: Dpsnet: End-to-end deep plane sweep stereo
– volume: 34
  start-page: 1403
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2021,
  ident: ref8
  article-title: Transformerfusion: Monocular RGB scene reconstruction using transformers
– ident: ref30
  doi: 10.1109/ICCV48922.2021.01578
– ident: ref40
  doi: 10.1117/12.473938
– ident: ref25
  doi: 10.1109/ACCESS.2021.3049548
– ident: ref1
  doi: 10.1109/TMM.2020.3017886
– ident: ref9
  doi: 10.1007/978-3-030-58571-6_25
– ident: ref7
  doi: 10.1109/CVPR52729.2023.01661
– ident: ref12
  doi: 10.1109/CVPR.2017.261
– volume: 27
  start-page: 2366
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2014
  ident: ref42
  article-title: Depth map prediction from a single image using a multi-scale deep network
– ident: ref10
  doi: 10.1109/3DV53792.2021.00042
– ident: ref29
  doi: 10.1007/978-3-030-01237-3_47
– ident: ref15
  doi: 10.1109/TMM.2023.3251697
– ident: ref24
  doi: 10.1108/IR-05-2015-0110
– ident: ref34
  doi: 10.1109/ICCV.2017.253
– ident: ref31
  doi: 10.1109/3DV53792.2021.00079
– ident: ref14
  doi: 10.1109/TMM.2021.3073265
– ident: ref35
  doi: 10.1109/ICCV51070.2023.00338
– ident: ref22
  doi: 10.20870/IJVR.2010.9.1.2761
– ident: ref45
  doi: 10.1109/CVPR46437.2021.01507
– ident: ref4
  doi: 10.1109/CVPR42600.2020.00724
– ident: ref32
  doi: 10.5721/EuJRS20144723
– ident: ref26
  doi: 10.1007/978-3-031-19827-4_1
– ident: ref2
  doi: 10.1109/TMM.2018.2859034
– ident: ref33
  doi: 10.1109/ICCV51070.2023.01627
– ident: ref11
  doi: 10.1109/CVPR46437.2021.01534
– ident: ref28
  doi: 10.1109/ICCV51070.2023.01689
– volume-title: Proc. Int. Conf. Learn. Representations
  year: 2021
  ident: ref36
  article-title: An image is worth 16x16 words: Transformers for image recognition at scale
– ident: ref23
  doi: 10.1109/TMM.2024.3388929
– ident: ref5
  doi: 10.1109/TIM.2020.3026719
– ident: ref13
  doi: 10.1109/CVPR.2013.377
– ident: ref27
  doi: 10.1109/ICCV51070.2023.01667
– volume-title: NIPS Workshop Deep Learn.
  year: 2014
  ident: ref38
  article-title: Empirical evaluation of gated recurrent neural networks on sequence modeling
– ident: ref6
  doi: 10.1109/ICRA40945.2020.9197388
– ident: ref41
  doi: 10.1109/ROBOT.2003.1241726
– ident: ref21
  doi: 10.1109/ICCV.2015.107
– ident: ref16
  doi: 10.1007/s10489-022-03724-9
– ident: ref46
  doi: 10.1109/CVPR.2019.00293
SSID ssj0014507
Score 2.4348257
Snippet Learning-based online monocular 3D reconstruction has emerged with great potential recently. Most state-of-the-art methods focus on two key questions, namely...
SourceID crossref
ieee
SourceType Index Database
Publisher
StartPage 3266
SubjectTerms 3D reconstruction
3D scene reconstruction
Accuracy
Feature extraction
Geometry
Image reconstruction
Learning systems
Legged locomotion
online 3D reconstruction
Representation learning
Surface reconstruction
Three-dimensional displays
Transformers
Title DetailRecon: Focusing on Detailed Regions for Online Monocular 3D Reconstruction
URI https://ieeexplore.ieee.org/document/10855550
Volume 27
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1941-0077
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014507
  issn: 1520-9210
  databaseCode: RIE
  dateStart: 19990101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA_qSQ9O58T5RQ5ePKRr2qRtvIlzDGFDZIPdSpq8XoRWcLv41_uStmMKgu2ltE1J30v73u99EnKXxiIxwA0rizJlAmUmU5nQrBSmzJK4NNabLmbzZLoULyu5apPVfS4MAPjgMwjcoffl29psnKls5CLlcUOEvp9mSZOstXUZCOlzo1EehUwhkOl8kqEaLWYzRIKRDGKJO-c_ZNBOUxUvUyY9Mu9m04SSvAebdRGYr1-FGv893RNy3GqX9LFZDqdkD6o-6XWdG2j7IffJ0U4ZwjPyOvZxpA6JVg90gg92BgRaV7S5AJa-gYtb_qSo4tKmOinFv0Htg1hpPKZ-7LYW7YAsJ8-LpylrOy0wg5BkzYxJpeKxtImOrbTaFf3jqoQYlZuwEFoWAnFKaMpQ6SLJbBRpo0sOiQAEhEUWn5ODqq7ggtDCZIhRJNeRscIoUDKFlHPQwLWx2gzJfUf7_KMpqJF7IBKqHPmUOz7lLZ-GZOCounNfQ9DLP85fkUM3vLGQXJMDfGu4QZ1hXdz6tfINJPG-CQ
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA-iB_XgdE6cnzl48dCuaZO28SbOMXUdIhvsVtLk9SJ0gtvFv96XtBtTEGwvJW1DeC_te7_3SchNEvFYA9NeWZSJx1FmejLlyiu5LtM4KrVxpotsHA-n_HkmZk2yusuFAQAXfAa-vXS-fDPXS2sq69lIeTwQoe8Izrmo07XWTgMuXHY0SqTAkwhlVl7JQPYmWYZYMBR-JPBk7IcU2mir4qTKoEXGq_XUwSTv_nJR-PrrV6nGfy_4kBw0-iW9rzfEEdmCqk1aq94NtPmU22R_oxDhMXntu0hSi0WrOzrAia0Jgc4rWt8AQ9_ARi5_UlRyaV2flOL_YO7CWGnUp-7ddTXaDpkOHicPQ6_pteBpBCULT-tESBYJE6vICKNs2T8mS4hQvQkKrkTBEakEugykKuLUhKHSqmQQc0BIWKTRCdmu5hWcElroFFGKYCrUhmsJUiSQMAYKmNJG6S65XdE-_6hLauQOigQyRz7llk95w6cu6ViqbjxXE_Tsj_FrsjucZKN89DR-OSd7dqraXnJBtpECcIkaxKK4cvvmG0euwVY
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DetailRecon%3A+Focusing+on+Detailed+Regions+for+Online+Monocular+3D+Reconstruction&rft.jtitle=IEEE+transactions+on+multimedia&rft.au=Chu%2C+Fupeng&rft.au=Cong%2C+Yang&rft.au=Wang%2C+Yanmei&rft.au=Chen%2C+Ronghan&rft.date=2025&rft.pub=IEEE&rft.issn=1520-9210&rft.volume=27&rft.spage=3266&rft.epage=3278&rft_id=info:doi/10.1109%2FTMM.2025.3535311&rft.externalDocID=10855550
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-9210&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-9210&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-9210&client=summon