Gaussian mixture models (GMM) are the most-widely employed approach to perform model-based clustering of continuous features. Grievously, with the increasing availability of high-dimensional datasets, their direct applicability is put at stake: GMMs suffer from the curse of dimensionality issue, as the number of parameters grows quadratically with the number of variables. To this extent, a methodological link between Gaussian mixtures and Gaussian graphical models has recently been established in order to provide a framework for performing penalized model-based clustering in presence of large precision matrices. Notwithstanding, current methodologies do not account for the fact that groups may be under or over-connected, thus implicitly assuming similar levels of sparsity across clusters. We overcome this limitation by defining data-driven and component specific penalty factors, automatically accounting for different degrees of connections within groups. A real data experiment on handwritten digits recognition showcases the validity of our proposal.

Casa, A., Cappozzo, A., Fop, M., Penalized Model-Based Clustering with Group-Dependent Shrinkage Estimation, Comunicazione, in Building Bridges between Soft and Statistical Methodologies for Data Science, (Valladolid, 14-16 September 2022), Springer, Valladolid 2023:1433 73-78. 10.1007/978-3-031-15509-3_10 [https://hdl.handle.net/10807/309186]

Penalized Model-Based Clustering with Group-Dependent Shrinkage Estimation

Cappozzo, Andrea;
2023

Abstract

Gaussian mixture models (GMM) are the most-widely employed approach to perform model-based clustering of continuous features. Grievously, with the increasing availability of high-dimensional datasets, their direct applicability is put at stake: GMMs suffer from the curse of dimensionality issue, as the number of parameters grows quadratically with the number of variables. To this extent, a methodological link between Gaussian mixtures and Gaussian graphical models has recently been established in order to provide a framework for performing penalized model-based clustering in presence of large precision matrices. Notwithstanding, current methodologies do not account for the fact that groups may be under or over-connected, thus implicitly assuming similar levels of sparsity across clusters. We overcome this limitation by defining data-driven and component specific penalty factors, automatically accounting for different degrees of connections within groups. A real data experiment on handwritten digits recognition showcases the validity of our proposal.
2023
Inglese
Building Bridges between Soft and Statistical Methodologies for Data Science
International Conference on Soft Methods in Probability and Statistics (SMPS)
Valladolid
Comunicazione
14-set-2022
16-set-2022
978-3-031-15508-6
Springer
Casa, A., Cappozzo, A., Fop, M., Penalized Model-Based Clustering with Group-Dependent Shrinkage Estimation, Comunicazione, in Building Bridges between Soft and Statistical Methodologies for Data Science, (Valladolid, 14-16 September 2022), Springer, Valladolid 2023:1433 73-78. 10.1007/978-3-031-15509-3_10 [https://hdl.handle.net/10807/309186]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/309186
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact