Abstract
There is a strong and continuously growing interest in using large electronic healthcare databases to study health outcomes and the effects of pharmaceutical products. However, concerns regarding disease misclassification (i.e. classification errors of the disease status) and its impact on the study results are legitimate. Validation is therefore increasingly recognized as an essential component of database research. In this work, we elucidate the interrelations between the true prevalence of a disease in a database population (i.e. prevalence assuming no disease misclassification), the observed prevalence subject to disease misclassification, and the most common validity indices: sensitivity, specificity, positive and negative predictive value. Based on this, we obtained analytical expressions to derive all the validity indices and true prevalence from the observed prevalence and any combination of two other parameters. The analytical expressions can be used for various purposes. Most notably, they can be used to obtain an estimate of the observed prevalence adjusted for outcome misclassification from any combination of two validity indices and to derive validity indices from each other which would otherwise be difficult to obtain. To allow researchers to easily use the analytical expressions, we additionally developed a user-friendly and freely available web-application.
Original language | English |
---|---|
Article number | e0231333 |
Journal | PLoS ONE |
Volume | 15 |
Issue number | 4 |
DOIs | |
Publication status | Published - 22 Apr 2020 |
Bibliographical note
Funding Information: This research was funded by the Innovative Medicines Initiative (IMI) Joint Undertaking through the ADVANCE project [ 115557]. The IMI is a joint initiative (publicprivate partnership) of the European Commission and the European Federation of Pharmaceutical Industries and Associations (EFPIA) to improve the competitive situation of the European Union in the field of pharmaceutical research. The IMI provided support in the form of salaries for KB, TDS, CD and RG but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. AR and NA did not receive any financial compensation for their contribution to this research. The specific roles of the authors are articulated in the 'author contributions' section.This work was funded by the Innovative Medicines Initiative (IMI) Joint Undertaking
through the ADVANCE project [ 115557]. P95 Epidemiology and Pharmacovigilance was one of the beneficiaries among the many public partners
of this IMI project, including both commercial and non-commercial organisations. P95 did not fund this study and the web-application is made freely available. The IMI provided support in the form of salaries for KB, TDS, CD and RG. This does not
alter our adherence to PLOS ONE policies on sharing data and materials. There are no patents, products in development or marketed products associated with this research to declare.
Open Access: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Publisher Copyright: © 2020 Bollaerts et al.
Citation: Bollaerts K, Rekkas A, De Smedt T, Dodd C, Andrews N, Gini R (2020) Disease misclassification in electronic healthcare database studies: Deriving validity indices—A contribution from the ADVANCE project. PLoS ONE 15(4):
e0231333.
DOI: https://doi.org/10.1371/journal.pone.0231333