A BSTRACT. We are interested in clustering data whose support is “curved”. Recently we have ad- dressed this problem, introducing a model which combines two ingredients: species sampling mixtures of parametric densities on one hand, and a deterministic clustering procedure (DBSCAN) on the other. In short, under this model two observations share the same cluster if the distance between the densities corresponding to their latent parameters is smaller than a threshold. However, in this case, the prior cluster assignment is based on the geometry of the space of kernel densities rather than a direct random partition prior elicitation. Following the latter alternative, a new hierarchical model for clustering is proposed here, where the data in each cluster are parametrically distributed around a curve (principal curve), and the prior cluster assignment is given on the latent variables at the second level of hierarchy according to a species sampling model. These two mixture models are compared here with respect to cluster estimates obtained for a simulated bivariate dataset from two clusters, one being banana-shaped.

Argiento, R., Cremaschi, A., Guglielmi, A., Cluster Analysis of Curved-Shaped Data with Species-Sampling Mixture Models, in Proceedings of SCo2013 - Complex Data Modeling and Computationally Intensive Statistical Methods for Estimation and Prediction, (M, 09-11 September 2013), Politecnico di Milano, Milano 2013: 1-6 [http://hdl.handle.net/10807/145224]

Cluster Analysis of Curved-Shaped Data with Species-Sampling Mixture Models

Argiento, Raffaele
Primo
;
2013

Abstract

A BSTRACT. We are interested in clustering data whose support is “curved”. Recently we have ad- dressed this problem, introducing a model which combines two ingredients: species sampling mixtures of parametric densities on one hand, and a deterministic clustering procedure (DBSCAN) on the other. In short, under this model two observations share the same cluster if the distance between the densities corresponding to their latent parameters is smaller than a threshold. However, in this case, the prior cluster assignment is based on the geometry of the space of kernel densities rather than a direct random partition prior elicitation. Following the latter alternative, a new hierarchical model for clustering is proposed here, where the data in each cluster are parametrically distributed around a curve (principal curve), and the prior cluster assignment is given on the latent variables at the second level of hierarchy according to a species sampling model. These two mixture models are compared here with respect to cluster estimates obtained for a simulated bivariate dataset from two clusters, one being banana-shaped.
2013
Inglese
Proceedings of SCo2013 - Complex Data Modeling and Computationally Intensive Statistical Methods for Estimation and Prediction
SCo2013 - Complex Data Modeling and Computationally Intensive Statistical Methods for Estimation and Prediction.
M
9-set-2013
11-set-2013
9788864930190
Politecnico di Milano
Argiento, R., Cremaschi, A., Guglielmi, A., Cluster Analysis of Curved-Shaped Data with Species-Sampling Mixture Models, in Proceedings of SCo2013 - Complex Data Modeling and Computationally Intensive Statistical Methods for Estimation and Prediction, (M, 09-11 September 2013), Politecnico di Milano, Milano 2013: 1-6 [http://hdl.handle.net/10807/145224]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/145224
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact