Probabilistic Modeling Of Electronic Health Records Data
Abstract: As a growing number of hospitals have adopted the use of electronic records systems to manage data collected during the course of routine patient care, leveraging this data to improve the quality of care has emerged as a key problem. Electronic health records data can be thought of as a multivariate time series that begins when the patient is admitted and ends when the patient is discharged. Each time series contains measurements for a different physiological variable like heart rate or blood pressure. While the observational nature of the data makes it easy to collect, it also results in several very challenging properties that push the boundaries of machine learning and computational statistics.
In this talk, I will describe a number of these properties including temporal sparsity, variable sampling frequency, lack of several forms of temporal alignment, the possible presence of sample selection bias and/or non-random missing data, and the confounding effect of interventions. I will present initial results using time series mixture models to extract patient clusters from pediatric intensive care unit data. This data exhibits many of the problematic characteristics described above. I will discuss applications of the resulting clusterings to physiology-based prediction and patient similarity search. I will conclude by highlighting a range of current and future directions.
This work will appear at the ACM SIGHIT International Health Informatics Symposium in January, 2012. It is part of an ongoing collaboration with Dr. Randall C. Wetzel and his research group in the Pediatric Intensive Care Unit at Childrens Hospital Los Angeles.