Causal directed acyclic graphs (DAGs) are naturally tailored to represent biological signalling pathways. However, a causal DAG is only identifiable up to Markov equivalence if only observational data are available. Interventional data, based on exogenous perturbations of the system, can greatly improve identifiability. Since the gain of an intervention crucially depends on the intervened variables, a natural issue is devising efficient strategies for optimal causal discovery. We present a Bayesian active learning procedure for Gaussian DAGs which requires no subjective specification on the side of the user, explicitly takes into account the uncertainty on the space of equivalence classes (through the posterior distribution) and sequentially proposes the choice of the optimal intervention variable. In simulation experiments our method, besides surpassing designs based on a random choice of intervention nodes, shows decisive improvements over currently available algorithms and is competitive with the best alternative benchmarks. An important reason behind this strong performance is that, unlike non-Bayesian algorithms, our utility function naturally incorporates graph estimation uncertainty through the posterior edge inclusion probability. We also reanalyse the Sachs data on protein signalling pathways from an active learning perspective and show that DAG identification can be achieved by using only a subset of the available intervention samples.

Castelletti, F., Consonni, G., Discovering causal structures in Bayesian Gaussian directed acyclic graph models, <<JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A, STATISTICS IN SOCIETY>>, 2020; 183 (4): 1727-1745. [doi:10.1111/rssa.12550] [http://hdl.handle.net/10807/146678]

Discovering causal structures in Bayesian Gaussian directed acyclic graph models

Castelletti, Federico
;
Consonni, Guido
2020

Abstract

Causal directed acyclic graphs (DAGs) are naturally tailored to represent biological signalling pathways. However, a causal DAG is only identifiable up to Markov equivalence if only observational data are available. Interventional data, based on exogenous perturbations of the system, can greatly improve identifiability. Since the gain of an intervention crucially depends on the intervened variables, a natural issue is devising efficient strategies for optimal causal discovery. We present a Bayesian active learning procedure for Gaussian DAGs which requires no subjective specification on the side of the user, explicitly takes into account the uncertainty on the space of equivalence classes (through the posterior distribution) and sequentially proposes the choice of the optimal intervention variable. In simulation experiments our method, besides surpassing designs based on a random choice of intervention nodes, shows decisive improvements over currently available algorithms and is competitive with the best alternative benchmarks. An important reason behind this strong performance is that, unlike non-Bayesian algorithms, our utility function naturally incorporates graph estimation uncertainty through the posterior edge inclusion probability. We also reanalyse the Sachs data on protein signalling pathways from an active learning perspective and show that DAG identification can be achieved by using only a subset of the available intervention samples.
2020
Inglese
Castelletti, F., Consonni, G., Discovering causal structures in Bayesian Gaussian directed acyclic graph models, <<JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A, STATISTICS IN SOCIETY>>, 2020; 183 (4): 1727-1745. [doi:10.1111/rssa.12550] [http://hdl.handle.net/10807/146678]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/146678
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 9
social impact