Mixed data comprise measurements of different types, with both categorical and continuous variables, and can be found in various areas, such as in life science or industrial processes. Inferring conditional independencies from the data is crucial to understand how these variables relate to each other. To this end, graphical models provide an effective framework, which adopts a graph-based representation of the joint distribution to encode such dependence relations. This framework has been extensively studied in the Gaussian and categorical settings separately; on the other hand, the literature addressing this problem in presence of mixed data is still narrow. We propose a Bayesian model for the analysis of mixed data based on the notion of Conditional Gaussian (CG) distribution. Our method is based on a canonical parameterization of the CG distribution, which allows for posterior inference of parameters indexing the (marginal) distributions of continuous and categorical variables, as well as expressing the interactions between the two types of variables. We derive the limiting Gaussian distributions, centered on the correct unknown value and with vanishing variance, for the Bayesian estimators of the canonical parameters expressing continuous, discrete and mixed interactions. In addition, we implement the proposed method for structure learning purposes, namely to infer the underlying graph of conditional independencies. When compared to alternative frequentist methods, our approach shows favorable results both in a simulation setting and in real-data applications, besides allowing for a coherent uncertainty quantification around parameter estimates.
Galimberti, C., Peluso, S., Castelletti, F., Bayesian inference of graph-based dependencies from mixed-type data, <<JOURNAL OF MULTIVARIATE ANALYSIS>>, 2024; 203 (203): N/A-N/A. [doi:10.1016/j.jmva.2024.105323] [https://hdl.handle.net/10807/291997]
Bayesian inference of graph-based dependencies from mixed-type data
Peluso, Stefano;Castelletti, Federico
2024
Abstract
Mixed data comprise measurements of different types, with both categorical and continuous variables, and can be found in various areas, such as in life science or industrial processes. Inferring conditional independencies from the data is crucial to understand how these variables relate to each other. To this end, graphical models provide an effective framework, which adopts a graph-based representation of the joint distribution to encode such dependence relations. This framework has been extensively studied in the Gaussian and categorical settings separately; on the other hand, the literature addressing this problem in presence of mixed data is still narrow. We propose a Bayesian model for the analysis of mixed data based on the notion of Conditional Gaussian (CG) distribution. Our method is based on a canonical parameterization of the CG distribution, which allows for posterior inference of parameters indexing the (marginal) distributions of continuous and categorical variables, as well as expressing the interactions between the two types of variables. We derive the limiting Gaussian distributions, centered on the correct unknown value and with vanishing variance, for the Bayesian estimators of the canonical parameters expressing continuous, discrete and mixed interactions. In addition, we implement the proposed method for structure learning purposes, namely to infer the underlying graph of conditional independencies. When compared to alternative frequentist methods, our approach shows favorable results both in a simulation setting and in real-data applications, besides allowing for a coherent uncertainty quantification around parameter estimates.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.