Depth Inpainting via Vision Transformer

Depth inpainting is a crucial task for working with augmented reality. In previous works missing depth values are completed by convolutional encoder-decoder networks, which is a kind of bottleneck. But nowadays vision transformers showed very good quality in various tasks of computer vision and some...

Full description

Saved in:

Bibliographic Details
Published in	2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct) pp. 286 - 291
Main Authors	Makarov, Ilya, Borisenko, Gleb
Format	Conference Proceeding
Language	English
Published	IEEE 01.10.2021
Subjects	3D imaging Artificial intelligence Computational modeling Computational photograph Computer vision Computing methodologies Graph neural networks Human computer interaction (HCI) Human-centered computing Image color analysis Interaction paradigms Mixed / augmented reality Pipelines Reconstruction Tokenization Training Transformers
Online Access	Get full text
DOI	10.1109/ISMAR-Adjunct54149.2021.00065

Cover

More Information
Summary:	Depth inpainting is a crucial task for working with augmented reality. In previous works missing depth values are completed by convolutional encoder-decoder networks, which is a kind of bottleneck. But nowadays vision transformers showed very good quality in various tasks of computer vision and some of them became state of the art. In this study, we presented a supervised method for depth inpainting by RGB images and sparse depth maps via vision transformers. The proposed model was trained and evaluated on the NYUv2 dataset. Experiments showed that a vision transformer with a restrictive convolutional tokenization model can improve the quality of the inpainted depth map.
DOI:	10.1109/ISMAR-Adjunct54149.2021.00065