Data-Dependent Hashing Based on p-Stable Distribution

The p-stable distribution is traditionally used for data-independent hashing. In this paper, we describe how to perform data-dependent hashing based on p-stable distribution. We commence by formulating the Euclidean distance preserving property in terms of variance estimation. Based on this property...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on image processing Vol. 23; no. 12; pp. 5033 - 5046
Main Authors	Bai, Xiao, Yang, Haichuan, Zhou, Jun, Ren, Peng, Cheng, Jian
Format	Journal Article
Language	English
Published	United States IEEE 01.12.2014 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Binary codes Educational institutions Euclidean distance Hash based algorithms Image processing Mathematical analysis Methods Preserves Projection Quantization (signal) Semantics Similarity Training Vectors Vectors (mathematics) hash retrieval Image retrieval p-stable distribution
Online Access	Get full text
ISSN	1057-7149 1941-0042 1941-0042
DOI	10.1109/TIP.2014.2352458

Cover

More Information
Summary:	The p-stable distribution is traditionally used for data-independent hashing. In this paper, we describe how to perform data-dependent hashing based on p-stable distribution. We commence by formulating the Euclidean distance preserving property in terms of variance estimation. Based on this property, we develop a projection method, which maps the original data to arbitrary dimensional vectors. Each projection vector is a linear combination of multiple random vectors subject to p-stable distribution, in which the weights for the linear combination are learned based on the training data. An orthogonal matrix is then learned data-dependently for minimizing the thresholding error in quantization. Combining the projection method and orthogonal matrix, we develop an unsupervised hashing scheme, which preserves the Euclidean distance. Compared with data-independent hashing methods, our method takes the data distribution into consideration and gives more accurate hashing results with compact hash codes. Different from many data-dependent hashing methods, our method accommodates multiple hash tables and is not restricted by the number of hash functions. To extend our method to a supervised scenario, we incorporate a supervised label propagation scheme into the proposed projection method. This results in a supervised hashing scheme, which preserves semantic similarity of data. Experimental results show that our methods have outperformed several state-of-the-art hashing approaches in both effectiveness and efficiency.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1057-7149 1941-0042 1941-0042
DOI:	10.1109/TIP.2014.2352458