Abstract The aim of the paper is to discuss the association between SNP genotype data and a disease. For genetic association studies, the statistical analyses with multiple markers have been shown to be more powerful, efficient, and biologically meaningful than single marker association tests. As the number of genetic markers considered is typically large, here we cluster them and then study the association between groups of markers and disease. We propose a two-step procedure: first a Bayesian nonparametric cluster estimate under normalized generalized gamma process mixture models is introduced, so that we are able to incorporate the information from a large-scale SNP data with a much smaller number of explanatory variables. Then, thanks to the introduction of a genetic score, we study the association between the relevant disease response and groups of markers using a logit model. Inference is obtained via an MCMC truncation method recently introduced in the literature. We also provide a review of the state of art of Bayesian nonparametric cluster models and algorithms for the class of mixtures adopted here. Finally, the model is applied to genome-wide association study of Crohn’s disease in a case-control setting.

Argiento, R., Guglielmi, A., Hsiao, C. K., Ruggeri, F., Wang, C., Nonparametric Bayesian Methods in Biostatistics and Bioinformatics, in Mitra, R. M., Nonparametric Bayesian Methods in Biostatistics and Bioinformatics, Springer International Publishing, CHE 2015: 115-134. 10.1007/978-3-319-19518-6_6 [http://hdl.handle.net/10807/148071]

Nonparametric Bayesian Methods in Biostatistics and Bioinformatics

Argiento, Raffaele
Primo
;
2015

Abstract

Abstract The aim of the paper is to discuss the association between SNP genotype data and a disease. For genetic association studies, the statistical analyses with multiple markers have been shown to be more powerful, efficient, and biologically meaningful than single marker association tests. As the number of genetic markers considered is typically large, here we cluster them and then study the association between groups of markers and disease. We propose a two-step procedure: first a Bayesian nonparametric cluster estimate under normalized generalized gamma process mixture models is introduced, so that we are able to incorporate the information from a large-scale SNP data with a much smaller number of explanatory variables. Then, thanks to the introduction of a genetic score, we study the association between the relevant disease response and groups of markers using a logit model. Inference is obtained via an MCMC truncation method recently introduced in the literature. We also provide a review of the state of art of Bayesian nonparametric cluster models and algorithms for the class of mixtures adopted here. Finally, the model is applied to genome-wide association study of Crohn’s disease in a case-control setting.
2015
Inglese
978-3-319-19517-9
Springer International Publishing
Argiento, R., Guglielmi, A., Hsiao, C. K., Ruggeri, F., Wang, C., Nonparametric Bayesian Methods in Biostatistics and Bioinformatics, in Mitra, R. M., Nonparametric Bayesian Methods in Biostatistics and Bioinformatics, Springer International Publishing, CHE 2015: 115-134. 10.1007/978-3-319-19518-6_6 [http://hdl.handle.net/10807/148071]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/148071
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 4
social impact