MIRROR: Multiple Indoor Rooms Reconstruction with Optimized Refinement from Monocular Videos

Automatically reconstructing structured 3D model of real-world indoor scenes has been an essential and challenging task in indoor navigation, evacuation planning and wireless signal simulation, etc. Despite the increasing demand of updated indoor models, indoor reconstruction from monocular videos i...

Full description

Saved in:
Bibliographic Details
Published inIEICE Transactions on Information and Systems p. 2025PCP0002
Main Authors IKENAGA, Takeshi, CHENG, Xina, LIU, Yanchao, WANG, Ziyue
Format Journal Article
LanguageEnglish
Published The Institute of Electronics, Information and Communication Engineers 2025
Subjects
Online AccessGet full text
ISSN0916-8532
1745-1361
DOI10.1587/transinf.2025PCP0002

Cover

More Information
Summary:Automatically reconstructing structured 3D model of real-world indoor scenes has been an essential and challenging task in indoor navigation, evacuation planning and wireless signal simulation, etc. Despite the increasing demand of updated indoor models, indoor reconstruction from monocular videos is still in an early stage in comparison with the reconstruction of outdoor scenes. Specific challenges are related to the complex building layouts which need long-term video recording, and the high presence of elements such as pieces of furniture causing clutter and occlusions. To accurately reconstruct the large-scale indoor scenes with multiple rooms, this paper designs a large-scale indoor multiple room 3D reconstruction pipeline to explore the topology relation between different rooms from long-term monocular videos. Firstly, semantic door detection based video segmentation is proposed to segment different rooms in video for individual reconstruction to avoid global mismatching noise, and 3D temporal trajectory is proposed to connect different rooms in spatial domain. Secondly, 3D Hough transform and Principal components analysis are utilized to refine the room boundary from reconstructed point clouds, which contributes to the accuracy improvement. Further, an original longterm video dataset for large-scale indoor multiple rooms reconstruction is constructed, which contains 12 real-world videos and 4 virtual videos with 30 rooms. Extensive experiments demonstrate that the proposed method reaches the highest performance of the 3D IoU at 0.70, room distance accuracy at 0.87, and connectivity accuracy at 0.67, which is around 39% better on average compared with various state-of-the-art models.
ISSN:0916-8532
1745-1361
DOI:10.1587/transinf.2025PCP0002