Conditional Negative Sampling for Contrastive Learning of Visual Representations
Recent methods for learning unsupervised visual representations, dubbed contrastive learning, optimize the noise-contrastive estimation (NCE) bound on mutual information between two views of an image. NCE uses randomly sampled negative examples to normalize the objective. In this paper, we show that...
Saved in:
| Main Authors | , , , , |
|---|---|
| Format | Journal Article |
| Language | English |
| Published |
05.10.2020
|
| Subjects | |
| Online Access | Get full text |
| DOI | 10.48550/arxiv.2010.02037 |
Cover
| Summary: | Recent methods for learning unsupervised visual representations, dubbed
contrastive learning, optimize the noise-contrastive estimation (NCE) bound on
mutual information between two views of an image. NCE uses randomly sampled
negative examples to normalize the objective. In this paper, we show that
choosing difficult negatives, or those more similar to the current instance,
can yield stronger representations. To do this, we introduce a family of mutual
information estimators that sample negatives conditionally -- in a "ring"
around each positive. We prove that these estimators lower-bound mutual
information, with higher bias but lower variance than NCE. Experimentally, we
find our approach, applied on top of existing models (IR, CMC, and MoCo)
improves accuracy by 2-5% points in each case, measured by linear evaluation on
four standard image datasets. Moreover, we find continued benefits when
transferring features to a variety of new image distributions from the
Meta-Dataset collection and to a variety of downstream tasks such as object
detection, instance segmentation, and keypoint detection. |
|---|---|
| DOI: | 10.48550/arxiv.2010.02037 |