Comparison of cluster-based and source-attribution methods for estimating transmission risk using large HIV sequence databases

Stéphane Le Vu*, Oliver Ratmann, Valerie Delpech, Alison E. Brown, O. Noel Gill, Anna Tostevin, Christophe Fraser, Erik M. Volz

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

15 Citations (Scopus)


Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors.

Original languageEnglish
Pages (from-to)1-10
Number of pages10
Publication statusPublished - Jun 2018

Bibliographical note

Funding Information:
We thank the UK Collaborative Group on HIV Drug Resistance for providing us with the surveillance data used to calibrate the simulated samples. We thank the Imperial College High Performance Computing Service (doi: 10.14469/hpc/2232). This work was supported by the National Institute for Health Research (NIHR) Health Protection Research Units in Modeling Methodology and Sexually Transmitted Infections (HPRU-2012-10080). E.M.V. is supported by the National Institutes of Health (R01AI087520). O.R. and C.F. are supported by Bill & Melinda Gates Foundation: Phylogenetics Networks to Address Transmission of HIV (OPP1084362). A.T. is supported by UK HIV Drug Resistance Database grant from the Medical Research Council (164587).

Publisher Copyright:
© 2017 The Authors


  • Cluster analysis
  • Computer simulation
  • HIV epidemiology
  • Phylodynamics
  • Phylogenetic analysis


Dive into the research topics of 'Comparison of cluster-based and source-attribution methods for estimating transmission risk using large HIV sequence databases'. Together they form a unique fingerprint.

Cite this