DetailRecon: Focusing on Detailed Regions for Online Monocular 3D Reconstruction
Learning-based online monocular 3D reconstruction has emerged with great potential recently. Most state-of-the-art methods focus on two key questions, namely 1) how to exploit accurate voxel features and 2) how to preserve detailed voxels in the sparsification process. However, 1) most methods adopt...
        Saved in:
      
    
          | Published in | IEEE transactions on multimedia Vol. 27; pp. 3266 - 3278 | 
|---|---|
| Main Authors | , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
            IEEE
    
        2025
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1520-9210 1941-0077  | 
| DOI | 10.1109/TMM.2025.3535311 | 
Cover
| Abstract | Learning-based online monocular 3D reconstruction has emerged with great potential recently. Most state-of-the-art methods focus on two key questions, namely 1) how to exploit accurate voxel features and 2) how to preserve detailed voxels in the sparsification process. However, 1) most methods adopt the same receptive field to extract features for both informative and uninformative regions, which struggle to capture geometric details. Furthermore, 2) they mainly utilize a fixed threshold or a straightforward ray-based algorithm to discard voxels in the sparsification process. However, some detailed regions (especially thin regions) may be discarded incorrectly. To tackle these challenges, we present a novel method named DetailRecon to focus on detailed regions that contain more geometric information. Specifically, we first propose an Adaptive Hybrid Fusion (AHF) module and a Connectivity-Aware Sparsification (CAS) module for voxel feature learning and voxel sparsification, respectively. 1) The AHF receives multiple feature maps with different receptive fields as input, and adaptively adopts a smaller receptive field for regions with fine structures to exploit accurate geometric details. 2) The CAS updates the occupancy value of voxels based on the connected voxels within its neighbor space, which could expand the radiation range of reliable voxels in detailed regions and eventually reduce their probability of being discarded. Moreover, 3) we introduce a lightweight yet effective pipeline named Focus On Fine (FOF) to accelerate our DetailRecon. In addition, 4) we propose a Hierarchical Consistency Loss (HCL) to align multi-level volume features, which assists in exploring accurate volume features for recovering more details. Extensive experiments conducted on the ScanNet (V2) and 7-Scenes datasets demonstrate the superiority of our DetailRecon. | 
    
|---|---|
| AbstractList | Learning-based online monocular 3D reconstruction has emerged with great potential recently. Most state-of-the-art methods focus on two key questions, namely 1) how to exploit accurate voxel features and 2) how to preserve detailed voxels in the sparsification process. However, 1) most methods adopt the same receptive field to extract features for both informative and uninformative regions, which struggle to capture geometric details. Furthermore, 2) they mainly utilize a fixed threshold or a straightforward ray-based algorithm to discard voxels in the sparsification process. However, some detailed regions (especially thin regions) may be discarded incorrectly. To tackle these challenges, we present a novel method named DetailRecon to focus on detailed regions that contain more geometric information. Specifically, we first propose an Adaptive Hybrid Fusion (AHF) module and a Connectivity-Aware Sparsification (CAS) module for voxel feature learning and voxel sparsification, respectively. 1) The AHF receives multiple feature maps with different receptive fields as input, and adaptively adopts a smaller receptive field for regions with fine structures to exploit accurate geometric details. 2) The CAS updates the occupancy value of voxels based on the connected voxels within its neighbor space, which could expand the radiation range of reliable voxels in detailed regions and eventually reduce their probability of being discarded. Moreover, 3) we introduce a lightweight yet effective pipeline named Focus On Fine (FOF) to accelerate our DetailRecon. In addition, 4) we propose a Hierarchical Consistency Loss (HCL) to align multi-level volume features, which assists in exploring accurate volume features for recovering more details. Extensive experiments conducted on the ScanNet (V2) and 7-Scenes datasets demonstrate the superiority of our DetailRecon. | 
    
| Author | Wang, Yanmei Chen, Ronghan Chu, Fupeng Cong, Yang  | 
    
| Author_xml | – sequence: 1 givenname: Fupeng orcidid: 0000-0002-0164-5850 surname: Chu fullname: Chu, Fupeng email: fupengchu@gmail.com organization: State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China – sequence: 2 givenname: Yang orcidid: 0000-0002-5102-0189 surname: Cong fullname: Cong, Yang email: congyang81@gmail.com organization: School of Automation Science and Engineering, South China University of Technology, Guangzhou, China – sequence: 3 givenname: Yanmei orcidid: 0000-0002-1869-7665 surname: Wang fullname: Wang, Yanmei email: wangyanmei@sia.cn organization: State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China – sequence: 4 givenname: Ronghan orcidid: 0000-0001-6307-2923 surname: Chen fullname: Chen, Ronghan email: chenronghan@sia.cn organization: State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China  | 
    
| BookMark | eNpNkE9PwzAMxSM0JLbBnQOHfIEOO3-alRvaGCBtGprGuUpTdyoqCWq6A9-ewHbAPtiS33uyfhM28sETY7cIM0Qo7vebzUyA0DOpUyNesDEWCjMAY0Zp1wKyQiBcsUmMHwCoNJgxe1vSYNtuRy74B74K7hhbf-DB89OBar6jQxt85E3o-dZ3rSe-CT4pO9tzueR_3jj0Rzck3TW7bGwX6eY8p-x99bRfvGTr7fPr4nGdOYFmyJwzukCp69zKWtc2_SOxaEgKI6FSVlcqVwJcA4Wt8nkthHW2QcoVKTDVXE4ZnHJdH2LsqSm_-vbT9t8lQvlLpExEyl8i5ZlIstydLC0R_ZPPdSqQP2yEXms | 
    
| CODEN | ITMUF8 | 
    
| Cites_doi | 10.1109/CVPR.2014.196 10.1007/978-3-319-46487-9_31 10.1609/aaai.v37i2.25358 10.1109/CVPR.2008.4587671 10.1109/3DV.2018.00037 10.1109/ICCV.2019.00274 10.1145/3503250 10.1109/TII.2020.3016393 10.1109/CVPR.2011.5995693 10.1109/ICCV48922.2021.01578 10.1117/12.473938 10.1109/ACCESS.2021.3049548 10.1109/TMM.2020.3017886 10.1007/978-3-030-58571-6_25 10.1109/CVPR52729.2023.01661 10.1109/CVPR.2017.261 10.1109/3DV53792.2021.00042 10.1007/978-3-030-01237-3_47 10.1109/TMM.2023.3251697 10.1108/IR-05-2015-0110 10.1109/ICCV.2017.253 10.1109/3DV53792.2021.00079 10.1109/TMM.2021.3073265 10.1109/ICCV51070.2023.00338 10.20870/IJVR.2010.9.1.2761 10.1109/CVPR46437.2021.01507 10.1109/CVPR42600.2020.00724 10.5721/EuJRS20144723 10.1007/978-3-031-19827-4_1 10.1109/TMM.2018.2859034 10.1109/ICCV51070.2023.01627 10.1109/CVPR46437.2021.01534 10.1109/ICCV51070.2023.01689 10.1109/TMM.2024.3388929 10.1109/TIM.2020.3026719 10.1109/CVPR.2013.377 10.1109/ICCV51070.2023.01667 10.1109/ICRA40945.2020.9197388 10.1109/ROBOT.2003.1241726 10.1109/ICCV.2015.107 10.1007/s10489-022-03724-9 10.1109/CVPR.2019.00293  | 
    
| ContentType | Journal Article | 
    
| DBID | 97E RIA RIE AAYXX CITATION  | 
    
| DOI | 10.1109/TMM.2025.3535311 | 
    
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore (NTUSG) CrossRef  | 
    
| DatabaseTitle | CrossRef | 
    
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Engineering Computer Science  | 
    
| EISSN | 1941-0077 | 
    
| EndPage | 3278 | 
    
| ExternalDocumentID | 10_1109_TMM_2025_3535311 10855550  | 
    
| Genre | orig-research | 
    
| GrantInformation_xml | – fundername: National Science and Technology Major Project of the New Generation of Artificial Intelligence grantid: 2018AAA0102900 – fundername: National Natural Science Foundation of China grantid: 62225310; 62127807; 62133005 funderid: 10.13039/501100001809  | 
    
| GroupedDBID | -~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P PQQKQ RIA RIE RNS TN5 VH1 ZY4 AAYXX CITATION  | 
    
| ID | FETCH-LOGICAL-c217t-cc759135d6a3d5da014319fe32730b4a5b46420cf09ab68d22acaf1e64e407b83 | 
    
| IEDL.DBID | RIE | 
    
| ISSN | 1520-9210 | 
    
| IngestDate | Wed Oct 01 05:46:59 EDT 2025 Wed Jun 18 06:01:23 EDT 2025  | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Language | English | 
    
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037  | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-c217t-cc759135d6a3d5da014319fe32730b4a5b46420cf09ab68d22acaf1e64e407b83 | 
    
| ORCID | 0000-0002-0164-5850 0000-0002-5102-0189 0000-0002-1869-7665 0000-0001-6307-2923  | 
    
| PageCount | 13 | 
    
| ParticipantIDs | crossref_primary_10_1109_TMM_2025_3535311 ieee_primary_10855550  | 
    
| ProviderPackageCode | CITATION AAYXX  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 20250000 2025-00-00  | 
    
| PublicationDateYYYYMMDD | 2025-01-01 | 
    
| PublicationDate_xml | – year: 2025 text: 20250000  | 
    
| PublicationDecade | 2020 | 
    
| PublicationTitle | IEEE transactions on multimedia | 
    
| PublicationTitleAbbrev | TMM | 
    
| PublicationYear | 2025 | 
    
| Publisher | IEEE | 
    
| Publisher_xml | – name: IEEE | 
    
| References | ref13 ref35 ref12 ref34 ref15 ref37 ref14 ref31 ref30 ref11 ref33 ref10 ref32 ref2 ref1 ref17 ref39 ref16 ref19 Im (ref44) 2019 ref18 Chung (ref38) 2014 ref24 ref46 ref23 ref45 ref26 ref25 ref47 ref20 ref41 ref22 ref21 ref43 ref28 Bozic (ref8) 2021,; 34 ref27 ref29 ref7 ref9 ref4 ref3 Dosovitskiy (ref36) 2021 Eigen (ref42) 2014; 27 ref6 ref5 ref40  | 
    
| References_xml | – ident: ref20 doi: 10.1109/CVPR.2014.196 – ident: ref17 doi: 10.1007/978-3-319-46487-9_31 – ident: ref37 doi: 10.1609/aaai.v37i2.25358 – ident: ref19 doi: 10.1109/CVPR.2008.4587671 – ident: ref39 doi: 10.1109/3DV.2018.00037 – ident: ref43 doi: 10.1109/ICCV.2019.00274 – ident: ref47 doi: 10.1145/3503250 – ident: ref3 doi: 10.1109/TII.2020.3016393 – ident: ref18 doi: 10.1109/CVPR.2011.5995693 – volume-title: Proc. Int. Conf. Learn. Representations year: 2019 ident: ref44 article-title: Dpsnet: End-to-end deep plane sweep stereo – volume: 34 start-page: 1403 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2021, ident: ref8 article-title: Transformerfusion: Monocular RGB scene reconstruction using transformers – ident: ref30 doi: 10.1109/ICCV48922.2021.01578 – ident: ref40 doi: 10.1117/12.473938 – ident: ref25 doi: 10.1109/ACCESS.2021.3049548 – ident: ref1 doi: 10.1109/TMM.2020.3017886 – ident: ref9 doi: 10.1007/978-3-030-58571-6_25 – ident: ref7 doi: 10.1109/CVPR52729.2023.01661 – ident: ref12 doi: 10.1109/CVPR.2017.261 – volume: 27 start-page: 2366 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2014 ident: ref42 article-title: Depth map prediction from a single image using a multi-scale deep network – ident: ref10 doi: 10.1109/3DV53792.2021.00042 – ident: ref29 doi: 10.1007/978-3-030-01237-3_47 – ident: ref15 doi: 10.1109/TMM.2023.3251697 – ident: ref24 doi: 10.1108/IR-05-2015-0110 – ident: ref34 doi: 10.1109/ICCV.2017.253 – ident: ref31 doi: 10.1109/3DV53792.2021.00079 – ident: ref14 doi: 10.1109/TMM.2021.3073265 – ident: ref35 doi: 10.1109/ICCV51070.2023.00338 – ident: ref22 doi: 10.20870/IJVR.2010.9.1.2761 – ident: ref45 doi: 10.1109/CVPR46437.2021.01507 – ident: ref4 doi: 10.1109/CVPR42600.2020.00724 – ident: ref32 doi: 10.5721/EuJRS20144723 – ident: ref26 doi: 10.1007/978-3-031-19827-4_1 – ident: ref2 doi: 10.1109/TMM.2018.2859034 – ident: ref33 doi: 10.1109/ICCV51070.2023.01627 – ident: ref11 doi: 10.1109/CVPR46437.2021.01534 – ident: ref28 doi: 10.1109/ICCV51070.2023.01689 – volume-title: Proc. Int. Conf. Learn. Representations year: 2021 ident: ref36 article-title: An image is worth 16x16 words: Transformers for image recognition at scale – ident: ref23 doi: 10.1109/TMM.2024.3388929 – ident: ref5 doi: 10.1109/TIM.2020.3026719 – ident: ref13 doi: 10.1109/CVPR.2013.377 – ident: ref27 doi: 10.1109/ICCV51070.2023.01667 – volume-title: NIPS Workshop Deep Learn. year: 2014 ident: ref38 article-title: Empirical evaluation of gated recurrent neural networks on sequence modeling – ident: ref6 doi: 10.1109/ICRA40945.2020.9197388 – ident: ref41 doi: 10.1109/ROBOT.2003.1241726 – ident: ref21 doi: 10.1109/ICCV.2015.107 – ident: ref16 doi: 10.1007/s10489-022-03724-9 – ident: ref46 doi: 10.1109/CVPR.2019.00293  | 
    
| SSID | ssj0014507 | 
    
| Score | 2.4348257 | 
    
| Snippet | Learning-based online monocular 3D reconstruction has emerged with great potential recently. Most state-of-the-art methods focus on two key questions, namely... | 
    
| SourceID | crossref ieee  | 
    
| SourceType | Index Database Publisher  | 
    
| StartPage | 3266 | 
    
| SubjectTerms | 3D reconstruction 3D scene reconstruction Accuracy Feature extraction Geometry Image reconstruction Learning systems Legged locomotion online 3D reconstruction Representation learning Surface reconstruction Three-dimensional displays Transformers  | 
    
| Title | DetailRecon: Focusing on Detailed Regions for Online Monocular 3D Reconstruction | 
    
| URI | https://ieeexplore.ieee.org/document/10855550 | 
    
| Volume | 27 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1941-0077 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014507 issn: 1520-9210 databaseCode: RIE dateStart: 19990101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA_qSQ9O58T5RQ5ePKRr2qRtvIlzDGFDZIPdSpq8XoRWcLv41_uStmMKgu2ltE1J30v73u99EnKXxiIxwA0rizJlAmUmU5nQrBSmzJK4NNabLmbzZLoULyu5apPVfS4MAPjgMwjcoffl29psnKls5CLlcUOEvp9mSZOstXUZCOlzo1EehUwhkOl8kqEaLWYzRIKRDGKJO-c_ZNBOUxUvUyY9Mu9m04SSvAebdRGYr1-FGv893RNy3GqX9LFZDqdkD6o-6XWdG2j7IffJ0U4ZwjPyOvZxpA6JVg90gg92BgRaV7S5AJa-gYtb_qSo4tKmOinFv0Htg1hpPKZ-7LYW7YAsJ8-LpylrOy0wg5BkzYxJpeKxtImOrbTaFf3jqoQYlZuwEFoWAnFKaMpQ6SLJbBRpo0sOiQAEhEUWn5ODqq7ggtDCZIhRJNeRscIoUDKFlHPQwLWx2gzJfUf7_KMpqJF7IBKqHPmUOz7lLZ-GZOCounNfQ9DLP85fkUM3vLGQXJMDfGu4QZ1hXdz6tfINJPG-CQ | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA-iB_XgdE6cnzl48dCuaZO28SbOMXUdIhvsVtLk9SJ0gtvFv96XtBtTEGwvJW1DeC_te7_3SchNEvFYA9NeWZSJx1FmejLlyiu5LtM4KrVxpotsHA-n_HkmZk2yusuFAQAXfAa-vXS-fDPXS2sq69lIeTwQoe8Izrmo07XWTgMuXHY0SqTAkwhlVl7JQPYmWYZYMBR-JPBk7IcU2mir4qTKoEXGq_XUwSTv_nJR-PrrV6nGfy_4kBw0-iW9rzfEEdmCqk1aq94NtPmU22R_oxDhMXntu0hSi0WrOzrAia0Jgc4rWt8AQ9_ARi5_UlRyaV2flOL_YO7CWGnUp-7ddTXaDpkOHicPQ6_pteBpBCULT-tESBYJE6vICKNs2T8mS4hQvQkKrkTBEakEugykKuLUhKHSqmQQc0BIWKTRCdmu5hWcElroFFGKYCrUhmsJUiSQMAYKmNJG6S65XdE-_6hLauQOigQyRz7llk95w6cu6ViqbjxXE_Tsj_FrsjucZKN89DR-OSd7dqraXnJBtpECcIkaxKK4cvvmG0euwVY | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DetailRecon%3A+Focusing+on+Detailed+Regions+for+Online+Monocular+3D+Reconstruction&rft.jtitle=IEEE+transactions+on+multimedia&rft.au=Chu%2C+Fupeng&rft.au=Cong%2C+Yang&rft.au=Wang%2C+Yanmei&rft.au=Chen%2C+Ronghan&rft.date=2025&rft.pub=IEEE&rft.issn=1520-9210&rft.volume=27&rft.spage=3266&rft.epage=3278&rft_id=info:doi/10.1109%2FTMM.2025.3535311&rft.externalDocID=10855550 | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-9210&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-9210&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-9210&client=summon |