Variational Distillation for Multi-View Learning

Information Bottleneck (IB) provides an information-theoretic principle for multi-view learning by revealing the various components contained in each viewpoint. This highlights the necessity to capture their distinct roles to achieve view-invariance and predictive representations but remains under-e...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 46; no. 7; pp. 4551 - 4566
Main Authors	Tian, Xudong, Zhang, Zhizhong, Wang, Cong, Zhang, Wensheng, Qu, Yanyun, Ma, Lizhuang, Wu, Zongze, Xie, Yuan, Tao, Dacheng
Format	Journal Article
Language	English
Published	United States IEEE 01.07.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Consistency Distillation information bottleneck Information theory knowledge distillation Learning Multi-view learning Mutual information Optimization Pattern analysis Predictive models Principles Representation learning Representations Task analysis variational inference Visualization
Online Access	Get full text
ISSN	0162-8828 1939-3539 2160-9292 1939-3539
DOI	10.1109/TPAMI.2023.3343717

Cover

More Information
Summary:	Information Bottleneck (IB) provides an information-theoretic principle for multi-view learning by revealing the various components contained in each viewpoint. This highlights the necessity to capture their distinct roles to achieve view-invariance and predictive representations but remains under-explored due to the technical intractability of modeling and organizing innumerable mutual information (MI) terms. Recent studies show that sufficiency and consistency play such key roles in multi-view representation learning, and could be preserved via a variational distillation framework. But when it generalizes to arbitrary viewpoints, such strategy fails as the mutual information terms of consistency become complicated. This paper presents Multi-View Variational Distillation (MV<inline-formula><tex-math notation="LaTeX">^{2}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="zhang-ieq1-3343717.gif"/> </inline-formula>D), tackling the above limitations for generalized multi-view learning. Uniquely, MV<inline-formula><tex-math notation="LaTeX">^{2}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="zhang-ieq2-3343717.gif"/> </inline-formula>D can recognize useful consistent information and prioritize diverse components by their generalization ability. This guides an analytical and scalable solution to achieving both sufficiency and consistency. Additionally, by rigorously reformulating the IB objective, MV<inline-formula><tex-math notation="LaTeX">^{2}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="zhang-ieq3-3343717.gif"/> </inline-formula>D tackles the difficulties in MI optimization and fully realizes the theoretical advantages of the information bottleneck principle. We extensively evaluate our model on diverse tasks to verify its effectiveness, where the considerable gains provide key insights into achieving generalized multi-view representations under a rigorous information-theoretic principle.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0162-8828 1939-3539 2160-9292 1939-3539
DOI:	10.1109/TPAMI.2023.3343717