Using narratives as a source to automatically learn phenotype models
Published in DMMI, 2014
Citation: Agarwal, V., Lependu, P., Podchiyska, T., Barber, R., Boland, M., Hripcsak, G., & Shah, N. (2014). Using narratives as a source to automatically learn phenotype models. In Workshop on Data Mining for Medical Informatics. http://www.dmmh.org/dmmi2014_submission_4.pdf
A key rate-limiting step in using electronic health records for research is the creation of electronic phenotyping algorithms. It is widely agreed that methods for electronic phenotyping should use the totality of EHR data including clinical notes, laboratory test results and medication orders, besides the coded administrative data that are readily available. In addition to efforts at creating consensus definitions for health outcomes, there are efforts at using machine learning to construct descriptions of phenotypes in lieu of traditional “algorithms” that identify patients with a health outcome of interest. A bottleneck in scaling the use of manually created clinical phenotyping algorithms is the time required in their creation and for the machine learning approaches the bottleneck is the creation of a manually labeled gold standard for training. It is clear that just focusing on manually creating larger training sets is not cost-effective. We demonstrate the feasibility of using large, automatically created ‘silver standards’ from comprehensive EHR data, in conjunction with expert knowledge codified in existing ontologies, to create phenotype models via machine learning.