PowerBacGWAS: a computational pipeline to perform power calculations for bacterial genome-wide association studies

Francesc Coll*, Theodore Gouliouris, Sebastian Bruchmann, Jody Phelan, Kathy E. Raven, Taane G. Clark, Julian Parkhill, Sharon J. Peacock

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)


PowerBacGWAS is a computational pipeline that uses existing genomic data to perform power calculations for bacterial genome-wide association studies.

Genome-wide association studies (GWAS) are increasingly being applied to investigate the genetic basis of bacterial traits. However, approaches to perform power calculations for bacterial GWAS are limited. Here we implemented two alternative approaches to conduct power calculations using existing collections of bacterial genomes. First, a sub-sampling approach was undertaken to reduce the allele frequency and effect size of a known and detectable genotype-phenotype relationship by modifying phenotype labels. Second, a phenotype-simulation approach was conducted to simulate phenotypes from existing genetic variants. We implemented both approaches into a computational pipeline (PowerBacGWAS) that supports power calculations for burden testing, pan-genome and variant GWAS; and applied it to collections of Enterococcus faecium, Klebsiella pneumoniae and Mycobacterium tuberculosis. We used this pipeline to determine sample sizes required to detect causal variants of different minor allele frequencies (MAF), effect sizes and phenotype heritability, and studied the effect of homoplasy and population diversity on the power to detect causal variants. Our pipeline and user documentation are made available and can be applied to other bacterial populations. PowerBacGWAS can be used to determine sample sizes required to find statistically significant associations, or the associations detectable with a given sample size. We recommend to perform power calculations using existing genomes of the bacterial species and population of study.

Original languageEnglish
Article number266
Number of pages12
JournalCommunications Biology
Issue number1
Publication statusPublished - 25 Mar 2022

Bibliographical note

Funding Information:
This project was funded by Wellcome Trust grant (201344/Z/16/Z) awarded to Francesc Coll. This publication was supported by the Health Innovation Challenge Fund (WT098600, HICF-T5-342), a parallel funding partnership between the Department of Health and Wellcome Trust. T.G.C. is funded by the Medical Research Council UK (Grant no. MR/M01360X/1, MR/N010469/1, MR/R025576/1 and MR/R020973/1) and BBSRC UK (Grant no. BB/R013063/1). The views expressed in this publication are those of the author(s) and not necessarily those of the funders.

Publisher Copyright:
© 2022, The Author(s).


  • TOOL


Dive into the research topics of 'PowerBacGWAS: a computational pipeline to perform power calculations for bacterial genome-wide association studies'. Together they form a unique fingerprint.

Cite this