We propose a new model for cluster analysis in a Bayesian nonparametric framework. Our model combines two ingredients, species sampling mixture models of Gaussian distributions on one hand, and a deterministic clustering procedure (DBSCAN) on the other. Here, two observations from the underlying species sampling mixture model share the same cluster if the distance between the densities corresponding to their latent parameters is smaller than a threshold; this yields a random partition which is coarser than the one induced by the species sampling mixture. Since this procedure depends on the value of the threshold, we suggest a strategy to fix it. In addition, we discuss implementation and applications of the model; comparison with more standard clustering algorithms will be given as well. Supplementary materials for the article are available online.

Argiento, R., Cremaschi, A., Guglielmi, A., A “Density-Based” Algorithm for Cluster Analysis Using Species Sampling Gaussian Mixture Models, <<JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS>>, 2014; 23 (4): 1126-1142. [doi:10.1080/10618600.2013.856796] [http://hdl.handle.net/10807/148064]

A “Density-Based” Algorithm for Cluster Analysis Using Species Sampling Gaussian Mixture Models

Argiento, Raffaele;
2014

Abstract

We propose a new model for cluster analysis in a Bayesian nonparametric framework. Our model combines two ingredients, species sampling mixture models of Gaussian distributions on one hand, and a deterministic clustering procedure (DBSCAN) on the other. Here, two observations from the underlying species sampling mixture model share the same cluster if the distance between the densities corresponding to their latent parameters is smaller than a threshold; this yields a random partition which is coarser than the one induced by the species sampling mixture. Since this procedure depends on the value of the threshold, we suggest a strategy to fix it. In addition, we discuss implementation and applications of the model; comparison with more standard clustering algorithms will be given as well. Supplementary materials for the article are available online.
2014
Inglese
Argiento, R., Cremaschi, A., Guglielmi, A., A “Density-Based” Algorithm for Cluster Analysis Using Species Sampling Gaussian Mixture Models, <<JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS>>, 2014; 23 (4): 1126-1142. [doi:10.1080/10618600.2013.856796] [http://hdl.handle.net/10807/148064]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/148064
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 17
  • ???jsp.display-item.citation.isi??? 17
social impact