K-CDFs: A Nonparametric Clustering Algorithm via Cumulative Distribution Function

We propose a novel partitioning clustering procedure based on the cumulative distribution function (CDF), called K-CDFs. For univariate data, the K-CDFs represent the cluster centers by empirical CDFs and assign each observation to the closest center measured by the Cram r-von Mises distance. The pr...

Full description

Saved in:
Bibliographic Details
Published inJournal of computational and graphical statistics Vol. 32; no. 1; pp. 304 - 318
Main Authors Liu, Jicai, Li, Jinhong, Zhang, Riquan
Format Journal Article
LanguageEnglish
Published Alexandria Taylor & Francis 02.01.2023
Taylor & Francis Ltd
Subjects
Online AccessGet full text
ISSN1061-8600
1537-2715
DOI10.1080/10618600.2022.2091575

Cover

More Information
Summary:We propose a novel partitioning clustering procedure based on the cumulative distribution function (CDF), called K-CDFs. For univariate data, the K-CDFs represent the cluster centers by empirical CDFs and assign each observation to the closest center measured by the Cram r-von Mises distance. The procedure is nonparametric and does not require assumptions on cluster distributions imposed by mixture models. A projection technique is used to generalize the K-CDFs for univariate data to an arbitrary dimension. The proposed procedure has several appealing properties. It is robust to heavy-tailed data, is not sensitive to the data dimensions, does not require moment conditions on data and can effectively detect linearly nonseparable clusters. To implement the K-CDFs, we propose two kinds of algorithms: a greedy algorithm as the classical Lloyd's algorithm and a spectral relaxation algorithm. We illustrate the finite sample performance of the proposed algorithms through simulation experiments and empirical analyses of several real datasets. Supplementary files for this article are available online.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1061-8600
1537-2715
DOI:10.1080/10618600.2022.2091575