Infection Control & Hospital Epidemiology
confounding, interactions, reverse causality, and effects by chance.24 Some machine-learning algorithms can detect linear and nonlinear interactions between variables, but confounding may be challenging to address adequately without human gui- dance through conceptual causal frameworks and expert knowl- edge. Implementation of machine learning algorithms that can account for observed confounding have been proposed and may become more advanced in the future.25–27 Thus, causal hypoth- eses could be generated from routine healthcare data via machine learning with little human guidance (eg, to identify a hidden outbreak source). However, application of machine learning in observational studies cannot replace adequately sized and well- executed randomized controlled trials in making causal inferences because randomized controlled trials account for both known and unknown/unmeasured confounding. Most recent studies applying machine learning to real-world
tasks in healthcare epidemiology relied, at least partly, on routine data originating from electronic medical records (Table 1). Such rich data sources have been shown to be especially useful for developing hypotheses about previously unknown risk factors and for building accurate prediction models for various outcome of interest (eg, specific healthcare-acquired infections, hospital complications).20,28–30 Linkage of electronic medical records data with high-quality
cohort or registry data has become a valuable option to add exposures, potential confounders, effect modifiers, and outcomes of interest with strict definition criteria that may not be present in routine medical records.31 This option is particularly valuable because data from electronic medical records (and other routine data sources) have been reported to sometimes be of lower quality than data acquired during prospective investigations due to changing and varying definition criteria/coding practices, and missing data.31,32 Therefore, studies relying on routinely collected health data require careful consideration of potentials for infor- mation bias, selection bias, and residual confounding at the design stage and analytical stage of the study. Furthermore, reporting of respective study results should be as transparent as possible.32 In addition to structured, routine data elements, unstructured
data (eg, clinical notes) can now provide reasonable information when analyzed by the machine-learning method of natural lan- guage processing; this approach can further increase the volume of accessible, routine healthcare data.33 However, the incremental value of healthcare data obtained from daily routine clinical notes is not proven, and both unstructured and structured routine data may not always be reliable and suitable.11 To utilize the increasing volumes of routine healthcare data
from health records and other routine data sources, concerns about data quality, data heterogeneity, missing data, and selective data collection are important to consider for any machine learning task; the main challenges and opportunities of studies relying on routine healthcare data and big data have been reviewed previously.11–15
Gaps in Knowledge
To date, little has been reported about applications of state-of- the-art machine learning to healthcare epidemiology. Specifically, the efficacy and effectiveness of machine-learning–derived pre- diction models to improve healthcare delivery has yet to be proven.3 Notably, it remains largely unknown how machine learning could be adequately translated into clinical practice.
1461
Therefore, more research is required to elucidate the good, the bad, and the unintended consequences of machine learning in healthcare epidemiology and to understand how to best apply machine learning findings to healthcare practice.34 Despite many sensational media reports, machine learning it not a magic technology that can convert data of poor quality into gold35 and, as a data scientist has stated recently, “Machine learning in healthcare is still the wild west.” The increasing volume, variety and velocity of routine
healthcare data clearly provide massive potential for supervised and potentially unsupervised machine learning tasks in healthcare epidemiology. However; to make optimal use of (big) routine data for quality improvement and healthcare research, these develop- ments should be met by appropriate methodological, ethical, and data security standards. In conclusion, digital healthcare epidemiology is a growing
field in medicine that is driven by the increasing availability of big data originating from daily routine documentation in healthcare. Machine learning may become an important tool in the arma- mentarium of healthcare epidemiologists to better exploit the potential of big data for infection prevention and control, quality improvement, and optimal allocation of hospital resources. Due to their complexity, machine-learning projects should usually be performed in close collaboration between domain experts and machine-learning specialists based on best practices.
Acknowledgments.
Financial support. This work was funded by the Division of Infectious Diseases and Hospital Epidemiology, University Hospital Basel, Basel, Switzerland.
Conflicts of interest. All authors report no conflicts of interest relevant to this article.
References
1. Sydnor ER, Perl TM. Hospital epidemiology and infection control in acute-care settings. Clin Microbiol Rev 2011;24:141–173.
2. Simmons BP, Parry MF, Williams M, Weinstein RA. The new era of hospital epidemiology: what you need to succeed. Clin Infect Dis 1996;22:550–553.
3. Wiens J, Shenoy ES. Machine learning for healthcare: on the verge of a major shift in healthcare epidemiology. Clin Infect Dis 2018;66:149–153.
4. Ross MK, Wei W, Ohno-Machado L. “Big data” and the electronic health record. Yearb Med Inform 2014;9:97–104.
5. Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G. Big data in health care: using analytics to identify and manage high-risk and high- cost patients. Health Aff 2014;33:1123–1131.
6. Moore GE. Cramming more components onto integrated circuits. Electronics 1965;38:114–117.
7. Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science 2015;349:255–260.
8. Salathé M. Digital epidemiology: What is it, and where is it going? Life Sci Soc Policy 2018;14:1.
9. Salathé M. Digital pharmacovigilance and disease surveillance: combining traditional and big-data systems for better public health. J Infect Dis 2016;214:S399–S403.
10. Salathé M, Freifeld CC, Mekaru SR, Tomasulo AF, Brownstein JS. Influenza A (H7N9) and the importance of digital epidemiology. N Engl J Med 2013;369:401–404.
11. Sips ME, Bonten MJM, van Mourik MSM. Automated surveillance of healthcare-associated infections: state of the art. Curr Opin Infect Dis 2017;30:425–431.
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36 |
Page 37 |
Page 38 |
Page 39 |
Page 40 |
Page 41 |
Page 42 |
Page 43 |
Page 44 |
Page 45 |
Page 46 |
Page 47 |
Page 48 |
Page 49 |
Page 50 |
Page 51 |
Page 52 |
Page 53 |
Page 54 |
Page 55 |
Page 56 |
Page 57 |
Page 58 |
Page 59 |
Page 60 |
Page 61 |
Page 62 |
Page 63 |
Page 64 |
Page 65 |
Page 66 |
Page 67 |
Page 68 |
Page 69 |
Page 70 |
Page 71 |
Page 72 |
Page 73 |
Page 74 |
Page 75 |
Page 76 |
Page 77 |
Page 78 |
Page 79 |
Page 80 |
Page 81 |
Page 82 |
Page 83 |
Page 84 |
Page 85 |
Page 86 |
Page 87 |
Page 88 |
Page 89 |
Page 90 |
Page 91 |
Page 92 |
Page 93 |
Page 94 |
Page 95 |
Page 96 |
Page 97 |
Page 98 |
Page 99 |
Page 100 |
Page 101 |
Page 102 |
Page 103 |
Page 104 |
Page 105 |
Page 106 |
Page 107 |
Page 108 |
Page 109 |
Page 110 |
Page 111 |
Page 112 |
Page 113 |
Page 114 |
Page 115 |
Page 116 |
Page 117 |
Page 118 |
Page 119 |
Page 120 |
Page 121 |
Page 122 |
Page 123 |
Page 124