Automated analysis of phylogenetic clusters

Manon Ragonnet-Cronin*, Emma Hodcroft, Stéphane Hué, Esther Fearnhill, Valerie Delpech, Andrew J.L. Brown, Samantha Lycett

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

223 Citations (Scopus)


Background: As sequence data sets used for the investigation of pathogen transmission patterns increase in size, automated tools and standardized methods for cluster analysis have become necessary. We have developed an automated Cluster Picker which identifies monophyletic clades meeting user-input criteria for bootstrap support and maximum genetic distance within large phylogenetic trees. A second tool, the Cluster Matcher, automates the process of linking genetic data to epidemiological or clinical data, and matches clusters between runs of the Cluster Picker.Results: We explore the effect of different bootstrap and genetic distance thresholds on clusters identified in a data set of publicly available HIV sequences, and compare these results to those of a previously published tool for cluster identification. To demonstrate their utility, we then use the Cluster Picker and Cluster Matcher together to investigate how clusters in the data set changed over time. We find that clusters containing sequences from more than one UK location at the first time point (multiple origin) were significantly more likely to grow than those representing only a single location.Conclusions: The Cluster Picker and Cluster Matcher can rapidly process phylogenetic trees containing tens of thousands of sequences. Together these tools will facilitate comparisons of pathogen transmission dynamics between studies and countries.

Original languageEnglish
Article number317
JournalBMC Bioinformatics
Publication statusPublished - 6 Nov 2013

Bibliographical note

Funding Information:
This work was supported by the Wellcome Trust (SJL, Grant number 092807) and the Biotechnology and Biological Science Research Council (EH and MRC, Grant number BB/F017030/1). The UK HIV Drug Resistance Database is supported by the Medical Research Council (grant number G0900274) and is partly funded by the Department of Health; the views expressed in the publication are those of the authors and not necessarily those of the Department of Health. Additional support for the HIVRDB is provided by Boehringer Ingelheim, Bristol-Myers Squibb, Gilead, Tibotec (a division of Janssen-Cilag) and Roche.


  • Cluster
  • Epidemiology
  • HIV
  • Phylogenetics
  • Sequence analysis
  • Virus


Dive into the research topics of 'Automated analysis of phylogenetic clusters'. Together they form a unique fingerprint.

Cite this