Unsupervised Multi-View CNN for Salient View Selection and 3D Interest Point Detection

We present an unsupervised 3D deep learning framework based on a ubiquitously true proposition named by us view-object consistency as it states that a 3D object and its projected 2D views always belong to the same object class. To validate its effectiveness, we design a multi-view CNN instantiating...

Full description

Saved in:

Bibliographic Details
Published in	International journal of computer vision Vol. 130; no. 5; pp. 1210 - 1227
Main Authors	Song, Ran, Zhang, Wei, Zhao, Yitian, Liu, Yonghuai
Format	Journal Article
Language	English
Published	New York Springer US 01.05.2022 Springer Springer Nature B.V
Subjects	Artificial Intelligence Channels Computer Imaging Computer Science Consistency Deep learning Evaluation Human subjects Image Processing and Computer Vision Machine learning Neural networks Object recognition Pattern Recognition Pattern Recognition and Graphics Science Semantics Special Issue on 3D Computer Vision Vision View-object consistency View selection Multi-view CNN Unsupervised 3D deep learning 3D interest point detection
Online Access	Get full text
ISSN	0920-5691 1573-1405 1573-1405
DOI	10.1007/s11263-022-01592-x

Cover

More Information
Summary:	We present an unsupervised 3D deep learning framework based on a ubiquitously true proposition named by us view-object consistency as it states that a 3D object and its projected 2D views always belong to the same object class. To validate its effectiveness, we design a multi-view CNN instantiating it for salient view selection and interest point detection of 3D objects, which quintessentially cannot be handled by supervised learning due to the difficulty of collecting sufficient and consistent training data. Our unsupervised multi-view CNN, namely UMVCNN, branches off two channels which encode the knowledge within each 2D view and the 3D object respectively and also exploits both intra-view and inter-view knowledge of the object. It ends with a new loss layer which formulates the view-object consistency by impelling the two channels to generate consistent classification outcomes. The UMVCNN is then integrated with a global distinction adjustment scheme to incorporate global cues into salient view selection. We evaluate our method for salient view section both qualitatively and quantitatively, demonstrating its superiority over several state-of-the-art methods. In addition, we showcase that our method can be used to select salient views of 3D scenes containing multiple objects. We also develop a method based on the UMVCNN for 3D interest point detection and conduct comparative evaluations on a publicly available benchmark, which shows that the UMVCNN is amenable to different 3D shape understanding tasks.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0920-5691 1573-1405 1573-1405
DOI:	10.1007/s11263-022-01592-x