On the Asymptotic Sample Complexity of HGR Maximal Correlation Functions in Semi-supervised Learning

The Hirschfeld-Gebelein-Rényi (HGR) maximal correlation has been shown useful in many machine learning applications, where the alternating conditional expectation (ACE) algorithm is widely adopted to estimate the HGR maximal correlation functions from data samples. In this paper, we consider the as...

Full description

Saved in:
Bibliographic Details
Published in2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton) pp. 879 - 886
Main Authors Xu, Xiangxiang, Huang, Shao-Lun
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2019
Subjects
Online AccessGet full text
DOI10.1109/ALLERTON.2019.8919892

Cover

Abstract The Hirschfeld-Gebelein-Rényi (HGR) maximal correlation has been shown useful in many machine learning applications, where the alternating conditional expectation (ACE) algorithm is widely adopted to estimate the HGR maximal correlation functions from data samples. In this paper, we consider the asymptotic sample complexity of estimating the HGR maximal correlation functions in semi-supervised learning, where both labeled and unlabeled data samples are used for the estimation. First, we propose a generalized ACE algorithm to deal with the unlabeled data samples. Then, we develop a mathematical framework to characterize the learning errors between the maximal correlation functions computed from the true distribution and the functions estimated from the generalized ACE algorithm. We establish the analytical expressions for the error exponents of the learning errors, which indicate the number of training samples required for estimating the HGR maximal correlation functions by the generalized ACE algorithm. Moreover, with our theoretical results, we investigate the sampling strategy for different types of samples in semisupervised learning with a total sampling budget constraint, and an optimal sampling strategy is developed to maximize the error exponent of the learning error. Finally, the numerical simulations are presented to support our theoretical results.
AbstractList The Hirschfeld-Gebelein-Rényi (HGR) maximal correlation has been shown useful in many machine learning applications, where the alternating conditional expectation (ACE) algorithm is widely adopted to estimate the HGR maximal correlation functions from data samples. In this paper, we consider the asymptotic sample complexity of estimating the HGR maximal correlation functions in semi-supervised learning, where both labeled and unlabeled data samples are used for the estimation. First, we propose a generalized ACE algorithm to deal with the unlabeled data samples. Then, we develop a mathematical framework to characterize the learning errors between the maximal correlation functions computed from the true distribution and the functions estimated from the generalized ACE algorithm. We establish the analytical expressions for the error exponents of the learning errors, which indicate the number of training samples required for estimating the HGR maximal correlation functions by the generalized ACE algorithm. Moreover, with our theoretical results, we investigate the sampling strategy for different types of samples in semisupervised learning with a total sampling budget constraint, and an optimal sampling strategy is developed to maximize the error exponent of the learning error. Finally, the numerical simulations are presented to support our theoretical results.
Author Xu, Xiangxiang
Huang, Shao-Lun
Author_xml – sequence: 1
  givenname: Xiangxiang
  surname: Xu
  fullname: Xu, Xiangxiang
  organization: Tsinghua University,Department of Electronic Engineering,Beijing,China,100084
– sequence: 2
  givenname: Shao-Lun
  surname: Huang
  fullname: Huang, Shao-Lun
  organization: Tsinghua-Berkeley Shenzhen Institute,Data Science and Information Technology Research Center,Shenzhen,China,518055
BookMark eNotj11LwzAYhSPohZv-AhHyB1rzNv3KZSn7EKqFbV6PpH2jgTYtaSbrv3fDXT0HDjycsyD3drBIyCuwEICJt6KqVrtD_RlGDESYCxC5iO7IArIoBw4JsEfS1pb6H6TFNPejH7xp6F72Y4e0HK44Gz_TQdPtZkc_5Nn0srs0zmEnvRksXZ9scw0TNZbusTfBdBrR_ZoJW1qhdNbY7yfyoGU34fONS_K1Xh3KbVDVm_eyqAITMe6DTCidxLrNMs44iAZYzKQUeRpBLHWTpDrSmivkTKtGcSXSNOVxrhtEBbkCviQv_16DiMfRXda6-Xg7zv8AHgFUuA
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ALLERTON.2019.8919892
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1728131510
9781728131511
EndPage 886
ExternalDocumentID 8919892
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i203t-79bf54fd7730319c1040aa986214afc56f2ff3be30fbcb3b9666348fceeb18b13
IEDL.DBID RIE
IngestDate Thu Jun 29 18:38:28 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-79bf54fd7730319c1040aa986214afc56f2ff3be30fbcb3b9666348fceeb18b13
PageCount 8
ParticipantIDs ieee_primary_8919892
PublicationCentury 2000
PublicationDate 2019-Sept.
PublicationDateYYYYMMDD 2019-09-01
PublicationDate_xml – month: 09
  year: 2019
  text: 2019-Sept.
PublicationDecade 2010
PublicationTitle 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
PublicationTitleAbbrev ALLERTON
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.7211341
Snippet The Hirschfeld-Gebelein-Rényi (HGR) maximal correlation has been shown useful in many machine learning applications, where the alternating conditional...
SourceID ieee
SourceType Publisher
StartPage 879
SubjectTerms Complexity theory
Correlation
Estimation
Machine learning
Semisupervised learning
Symmetric matrices
Training
Title On the Asymptotic Sample Complexity of HGR Maximal Correlation Functions in Semi-supervised Learning
URI https://ieeexplore.ieee.org/document/8919892
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT4MwFG7mTp7UbMbf6cGj3RgtpRwXs7kYt5n9SHZbaGkN0cGyQeL8630FnNF48EaABOh7rx8ffO8rQrdaecIA8pNACkMYEGViaTOh1HO0a6KOkJYoDkd8MGePC29RQ3f7XhitdSE-0y27WfzLj1KV209lbRFYhQ9MuAe-4GWvVtWU03GCdvfpqTeZjUdWrwUJUJ77Y9GUAjP6R2j4dbVSKvLayjPZUh-_jBj_ezvHqPndnYef97hzgmo6aaBonGB4l8Pd7W61zlJIBzwNrfMvthVvXS-zHU4NHjxM8DB8j1fhGxzZbCotHO4DvhUpiOMET_UqJtt8beeRrY5wZcL60kTzfm92PyDVCgokdh2aET-QxmMm8qGOodYUcC8nDANgMR0WGuVx4xpDpaaOkUpSCdyHUyYMPIG0UaKnqJ6kiT5DWHDlW2887rmGcUcJz6jQSF9FjAvF_HPUsCO0XJcmGctqcC7-3n2JDm2USrHWFapnm1xfA7pn8qYI6yfQ7af7
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT4MwGG2WedCTGmf8bQ8e7cbWFspxMZuosJn9SHZbaGkN0cGyQeL8620BZzQevBEgAfp9Xx8P3vcKwI0UlCmN_MjlTCGiiTIytBlhTC3ZUVGbcUMUg4HtTcnjjM5q4HbbCyOlLMRnsmk2i3_5USpy86msxVyj8NET7g4lhNCyW6tqy2lbbqvr-73RZDgwii2dAuXZP5ZNKVCjvw-Cr-uVYpHXZp7xpvj4ZcX43xs6AI3v_jz4vEWeQ1CTyRGIhgnUb3Owu94sllmqEwKOQ-P9C03NG9_LbANTBb37EQzC93gRvukjq1WlhoN9jXBFEsI4gWO5iNE6X5qZZC0jWNmwvjTAtN-b3HmoWkMBxR0LZ8hxuaJERY6uZF1tQrMvKwxdzWPaJFSC2qqjFOYSW4oLjrlmPzYmTOkn4CZO-BjUkzSRJwAyWzjGHc-mHUVsSzCqRKi4IyJiM0GcU3BkRmi-LG0y5tXgnP29-xrsepPAn_sPg6dzsGciVkq3LkA9W-XyUmN9xq-KEH8Cx6qrSA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2019+57th+Annual+Allerton+Conference+on+Communication%2C+Control%2C+and+Computing+%28Allerton%29&rft.atitle=On+the+Asymptotic+Sample+Complexity+of+HGR+Maximal+Correlation+Functions+in+Semi-supervised+Learning&rft.au=Xu%2C+Xiangxiang&rft.au=Huang%2C+Shao-Lun&rft.date=2019-09-01&rft.pub=IEEE&rft.spage=879&rft.epage=886&rft_id=info:doi/10.1109%2FALLERTON.2019.8919892&rft.externalDocID=8919892