k-Mnv-Rep: A k-type clustering algorithm for matrix-object data

•We define a novel dissimilarity measure between two numeric matrix-objects.•We provide an update policy of cluster centers for numeric matrix-object data.•We propose the k-Mnv-Rep algorithm to cluster numeric matrix-object data.•We propose the k-Mv-Rep algorithm to cluster hybrid matrix-object data...

Full description

Saved in:
Bibliographic Details
Published inInformation sciences Vol. 542; pp. 40 - 57
Main Authors Yu, Liqin, Cao, Fuyuan, Gao, Xiao-Zhi, Liu, Jing, Liang, Jiye
Format Journal Article
LanguageEnglish
Published Elsevier Inc 04.01.2021
Subjects
Online AccessGet full text
ISSN0020-0255
1872-6291
DOI10.1016/j.ins.2020.06.071

Cover

More Information
Summary:•We define a novel dissimilarity measure between two numeric matrix-objects.•We provide an update policy of cluster centers for numeric matrix-object data.•We propose the k-Mnv-Rep algorithm to cluster numeric matrix-object data.•We propose the k-Mv-Rep algorithm to cluster hybrid matrix-object data. In matrix-object data, an object (or a sample) is described by more than one feature vector (record) and all of those feature vectors are responsible for the observed classification of the object. A task for matrix-object data is to cluster it into a set of groups by analyzing and utilizing the information of feature vectors. Matrix-object data are widespread in many real applications. Previous studies typically address data sets that an object is generally represented by a feature vector, which may be violated in many real-world tasks. In this paper, we propose a k-multi-numeric-values-representatives (abbr. k-Mnv-Rep) algorithm to cluster numeric matrix-object data. In this algorithm, a new dissimilarity measure between two numeric matrix-objects is defined and a new heuristic method of updating cluster centers is given. Furthermore, we also propose a k-multi-values-representatives (abbr. k-Mv-Rep) algorithm to cluster hybrid matrix-object data. The two proposed algorithms break the limitations of the previous studies, and can be applied to address matrix-object data sets that exist widely in many real-world tasks. The benefits and effectiveness of the two algorithms are shown by some experiments on real and synthetic data sets.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2020.06.071