Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data

Yang Yang, Katherine E. Niehaus, Timothy M. Walker, Zamin Iqbal, A. Sarah Walker, Daniel J. Wilson, Tim E.A. Peto, Derrick W. Crook, E. Grace Smith, Tingting Zhu, David A. Clifton*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

55 Citations (Scopus)


Motivation Correct and rapid determination of Mycobacterium tuberculosis (MTB) resistance against available tuberculosis (TB) drugs is essential for the control and management of TB. Conventional molecular diagnostic test assumes that the presence of any well-studied single nucleotide polymorphisms is sufficient to cause resistance, which yields low sensitivity for resistance classification. Summary Given the availability of DNA sequencing data from MTB, we developed machine learning models for a cohort of 1839 UK bacterial isolates to classify MTB resistance against eight anti-TB drugs (isoniazid, rifampicin, ethambutol, pyrazinamide, ciprofloxacin, moxifloxacin, ofloxacin, streptomycin) and to classify multi-drug resistance. Results Compared to previous rules-based approach, the sensitivities from the best-performing models increased by 2-4% for isoniazid, rifampicin and ethambutol to 97% (P < 0.01), respectively; for ciprofloxacin and multi-drug resistant TB, they increased to 96%. For moxifloxacin and ofloxacin, sensitivities increased by 12 and 15% from 83 and 81% based on existing known resistance alleles to 95% and 96% (P < 0.01), respectively. Particularly, our models improved sensitivities compared to the previous rules-based approach by 15 and 24% to 84 and 87% for pyrazinamide and streptomycin (P < 0.01), respectively. The best-performing models increase the area-under-the-ROC curve by 10% for pyrazinamide and streptomycin (P < 0.01), and 4-8% for other drugs (P < 0.01).

Original languageEnglish
Pages (from-to)1666-1671
Number of pages6
Issue number10
Publication statusPublished - 15 May 2018

Bibliographical note

Funding Information:
This project was funded by a K.C. Wong Postdoctoral Fellowship, the Rhodes Trust, the RCUK Digital Economy Programme (Grant EP/G036861/1), an EPSRC Grand Challenge award (EP/N020774/1), the Bill & Melinda Gates Foundation, the Wellcome Trust, NIHR Senior Investigatorships, and the Royal Society (Grant 101237/Z/13/Z).

Funding Information:
Y.Y. gratefully acknowledges the support of a K.C.Wong Education Foundation. K.E.N. acknowledges funding from the Rhodes Trust and the RCUK Digital Economy Programme grant number EP/G036861/1 (Center for Doctoral Training in Healthcare Innovation). D.A.C. was supported by the Royal Academy of Engineering, the EPSRC via a ‘Grand Challenge’ award. This project was supported by the Bill & Melinda Gates Foundation, Wellcome Trust and the NIHR Senior Investigators. D.J.W. is a Sir Henry Dale Fellow, jointly funded by the Wellcome Trust and the Royal Society (Grant 101237/Z/13/Z).

Publisher Copyright:
© The Author(s) 2017.


Dive into the research topics of 'Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data'. Together they form a unique fingerprint.

Cite this