Identification of patient prescribing predicting cancer diagnosis using boosted decision trees

Josephine French*, Cong Chen, Katherine Henson, Brian Shand, Patrick Ferris, Josh Pencheon, Sally Vernon, Meena Rafiq, David Howe, Georgios Lyratzopoulos, Jem Rashbass

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    1 Citation (Scopus)

    Abstract

    Machine learning has potential to identify patterns in pre-diagnostic prescribing that act as an early signal of cancer diagnosis. Danish studies using classical regression models have shown that prescribing of particular drugs increases in the months prior to lung and colorectal cancer diagnosis. The aim of this case-control study is to assess the potential for machine learning to extend these findings to identify combinations of prescriptions that might act as pre-cancer signals. We use a boosted trees approach to analyse prescriptions data from NHS Business Services Authority linked to English cancer registry data to classify individuals into two classes: cancer patients and controls. We then identify the drugs that contributed the most to the classification decisions in the models. To the best of our knowledge, this is the first study utilising machine learning to find pre-diagnostic primary-care-prescription-related indicators of cancer diagnosis in England. We assess two feature selection approaches using text categorisation methods alone and in combination with clinical domain knowledge. Matched samples of controls (ten controls for each patient) to control for age are used throughout. We train models for matched cohorts of 6,770 lung cancer patients and 5,869 colorectal cancer patients starting the cancer pathway for the first time between January and March 2016. The models outperform classical methods by AUC, AUC-PR, and F 0.5 score, showing strong potential for using machine learning to extract signals from this dataset to aid earlier diagnosis. Our findings confirm the Danish studies.

    Original languageEnglish
    Title of host publicationArtificial Intelligence in Medicine - 17th Conference on Artificial Intelligence in Medicine, AIME 2019, Proceedings
    EditorsDavid Riaño, Szymon Wilk, Annette ten Teije
    PublisherSpringer Verlag
    Pages328-333
    Number of pages6
    ISBN (Print)9783030216412
    DOIs
    Publication statusPublished - 2019
    Event17th Conference on Artificial Intelligence in Medicine, AIME 2019 - Poznan, Poland
    Duration: 26 Jun 201929 Jun 2019

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume11526 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference17th Conference on Artificial Intelligence in Medicine, AIME 2019
    Country/TerritoryPoland
    CityPoznan
    Period26/06/1929/06/19

    Bibliographical note

    Funding Information:
    Abstract. Machine learning has potential to identify patterns in prediagnostic prescribing that act as an early signal of cancer diagnosis. Danish studies using classical regression models have shown that prescribing of particular drugs increases in the months prior to lung and colorectal cancer diagnosis. The aim of this case-control study is to assess the potential for machine learning to extend these findings to identify combinations of prescriptions that might act as pre-cancer signals. We use a boosted trees approach to analyse prescriptions data from NHS Business Services Authority linked to English cancer registry data to classify individuals into two classes: cancer patients and controls. We then identify the drugs that contributed the most to the classification decisions in the models. To the best of our knowledge, this is the first study utilising machine learning to find pre-diagnostic primary-care-prescription-related indicators of cancer diagnosis in England. We assess two feature selection approaches using text categorisation methods alone and in combination with clinical domain knowledge. Matched samples of controls (ten controls for each patient) to control for age are used throughout. We train models for matched cohorts of 6,770 lung cancer patients and 5,869 colorectal cancer patients starting the cancer pathway for the first time between January and March 2016. The models outperform classical methods by AUC, AUC-PR, and F0.5 score, showing strong potential for Supported by a Cancer Research UK Pioneer Award. Data for this study is based on patient-level information collected by the NHS, as part of the care and support of cancer patients. The data is collated, maintained and quality assured by the National Cancer Registration and Analysis Service, which is part of Public Health England (PHE). Dr. Meena Rafiq is funded by a National Institute for Health Research (NIHR) in-practice clinical fellowship (IPF-2017-11-011). This article presents independent research funded by the NIHR. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

    Publisher Copyright:
    © Springer Nature Switzerland AG 2019.

    Keywords

    • Boosted trees
    • Cancer
    • Clinical input
    • Feature selection

    Fingerprint

    Dive into the research topics of 'Identification of patient prescribing predicting cancer diagnosis using boosted decision trees'. Together they form a unique fingerprint.

    Cite this