Target-Independent Active Learning via Distribution-Splitting

To reduce the label complexity in Agnostic Active Learning (A^2 algorithm), volume-splitting splits the hypothesis edges to reduce the Vapnik-Chervonenkis (VC) dimension in version space. However, the effectiveness of volume-splitting critically depends on the initial hypothesis and this problem is...

Full description

Saved in:
Bibliographic Details
Main Authors Cao, Xiaofeng, Tsang, Ivor W, Xu, Xiaofeng, Xu, Guandong
Format Journal Article
LanguageEnglish
Published 28.09.2018
Subjects
Online AccessGet full text
DOI10.48550/arxiv.1809.10962

Cover

Abstract To reduce the label complexity in Agnostic Active Learning (A^2 algorithm), volume-splitting splits the hypothesis edges to reduce the Vapnik-Chervonenkis (VC) dimension in version space. However, the effectiveness of volume-splitting critically depends on the initial hypothesis and this problem is also known as target-dependent label complexity gap. This paper attempts to minimize this gap by introducing a novel notion of number density which provides a more natural and direct way to describe the hypothesis distribution than volume. By discovering the connections between hypothesis and input distribution, we map the volume of version space into the number density and propose a target-independent distribution-splitting strategy with the following advantages: 1) provide theoretical guarantees on reducing label complexity and error rate as volume-splitting; 2) break the curse of initial hypothesis; 3) provide model guidance for a target-independent AL algorithm in real AL tasks. With these guarantees, for AL application, we then split the input distribution into more near-optimal spheres and develop an application algorithm called Distribution-based A^2 (DA^2). Experiments further verify the effectiveness of the halving and querying abilities of DA^2. Contributions of this paper are as follows.
AbstractList To reduce the label complexity in Agnostic Active Learning (A^2 algorithm), volume-splitting splits the hypothesis edges to reduce the Vapnik-Chervonenkis (VC) dimension in version space. However, the effectiveness of volume-splitting critically depends on the initial hypothesis and this problem is also known as target-dependent label complexity gap. This paper attempts to minimize this gap by introducing a novel notion of number density which provides a more natural and direct way to describe the hypothesis distribution than volume. By discovering the connections between hypothesis and input distribution, we map the volume of version space into the number density and propose a target-independent distribution-splitting strategy with the following advantages: 1) provide theoretical guarantees on reducing label complexity and error rate as volume-splitting; 2) break the curse of initial hypothesis; 3) provide model guidance for a target-independent AL algorithm in real AL tasks. With these guarantees, for AL application, we then split the input distribution into more near-optimal spheres and develop an application algorithm called Distribution-based A^2 (DA^2). Experiments further verify the effectiveness of the halving and querying abilities of DA^2. Contributions of this paper are as follows.
Author Xu, Xiaofeng
Xu, Guandong
Cao, Xiaofeng
Tsang, Ivor W
Author_xml – sequence: 1
  givenname: Xiaofeng
  surname: Cao
  fullname: Cao, Xiaofeng
– sequence: 2
  givenname: Ivor W
  surname: Tsang
  fullname: Tsang, Ivor W
– sequence: 3
  givenname: Xiaofeng
  surname: Xu
  fullname: Xu, Xiaofeng
– sequence: 4
  givenname: Guandong
  surname: Xu
  fullname: Xu, Guandong
BackLink https://doi.org/10.48550/arXiv.1809.10962$$DView paper in arXiv
BookMark eNrjYmDJy89LZWCQNDTQM7EwNTXQTyyqyCzTM7QwsNQzNLA0M-JksA1JLEpPLdH1zEtJLUgFEnklCo7JJZllqQo-qYlFeZl56QplmYkKLpnFJUWZSaUlmfl5usEFOZklJUApHgbWtMSc4lReKM3NIO_mGuLsoQu2KL6gKDM3sagyHmRhPNhCY8IqAKBXN9Q
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
EPD
GOX
DOI 10.48550/arxiv.1809.10962
DatabaseName arXiv Computer Science
arXiv Statistics
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 1809_10962
GroupedDBID AKY
EPD
GOX
ID FETCH-arxiv_primary_1809_109623
IEDL.DBID GOX
IngestDate Tue Jul 22 23:01:30 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-arxiv_primary_1809_109623
OpenAccessLink https://arxiv.org/abs/1809.10962
ParticipantIDs arxiv_primary_1809_10962
PublicationCentury 2000
PublicationDate 2018-09-28
PublicationDateYYYYMMDD 2018-09-28
PublicationDate_xml – month: 09
  year: 2018
  text: 2018-09-28
  day: 28
PublicationDecade 2010
PublicationYear 2018
Score 3.3394494
SecondaryResourceType preprint
Snippet To reduce the label complexity in Agnostic Active Learning (A^2 algorithm), volume-splitting splits the hypothesis edges to reduce the Vapnik-Chervonenkis (VC)...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Learning
Statistics - Machine Learning
Title Target-Independent Active Learning via Distribution-Splitting
URI https://arxiv.org/abs/1809.10962
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQAVa5Bibmpmm6iSZJwA6KYZKZbpJBaoquWbJ5sqVBWmpKcjJoN7Kvn5lHqIlXhGkEE4MCbC9MYlFFZhnkfOCkYn3Q4VKgA49AhSwzsKEA2szrHwGZnAQfxQVVj1AHbGOChZAqCTdBBn5o607BERIdQgxMqXkiDMBeMWjYWdcTfuNsiYIjuJhRgJ5umq5Qlpmo4AI6wxZ6_ZRuMLBxCF6SLMog7-Ya4uyhC7YwvgByOkQ8yC3xYLcYizGwAPvwqRIMCpZmKYYW5omp5iYpwCaSsUWicUoqaMY32SDZ0tgi1VySQQKXKVK4paQZuID1N3j5gpGFDANLSVFpqiywjixJkgMHFADv8Gvu
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Target-Independent+Active+Learning+via+Distribution-Splitting&rft.au=Cao%2C+Xiaofeng&rft.au=Tsang%2C+Ivor+W&rft.au=Xu%2C+Xiaofeng&rft.au=Xu%2C+Guandong&rft.date=2018-09-28&rft_id=info:doi/10.48550%2Farxiv.1809.10962&rft.externalDocID=1809_10962