Suppr超能文献

时态医学数据中的抗噪声相似性搜索

Noise-tolerant similarity search in temporal medical data.

作者信息

Bonomi Luca, Fan Liyue, Jiang Xiaoqian

机构信息

UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, United States of America.

Department of Computer Science, University of North Carolina at Charlotte, Charlotte, United States of America.

出版信息

J Biomed Inform. 2021 Jan;113:103667. doi: 10.1016/j.jbi.2020.103667. Epub 2020 Dec 25.

Abstract

Temporal medical data are increasingly integrated into the development of data-driven methods to deliver better healthcare. Searching such data for patterns can improve the detection of disease cases and facilitate the design of preemptive interventions. For example, specific temporal patterns could be used to recognize low-prevalence diseases, which are often under-diagnosed. However, searching these patterns in temporal medical data is challenging, as the data are often noisy, complex, and large in scale. In this work, we propose an effective and efficient solution to search for patients who exhibit conditions that resemble the input query. In our solution, we propose a similarity notion based on the Longest Common Subsequence (LCSS), which is used to measure the similarity between the query and the patient's temporal medical data and to ensure robustness against noise in the data. Our solution adopts locality sensitive hashing techniques to address the high dimensionality of medical data, by embedding the recorded clinical events (e.g., medications and diagnosis codes) into compact signatures. To perform pattern search in large EHR datasets, we propose a filtering approach based on tandem patterns, which effectively identifies candidate matches while discarding irrelevant data. The evaluations conducted using a real-world dataset demonstrate that our solution is highly accurate while significantly accelerating the similarity search.

摘要

时态医学数据越来越多地被整合到数据驱动方法的开发中,以提供更好的医疗保健服务。在这些数据中搜索模式可以改善疾病病例的检测,并有助于设计预防性干预措施。例如,特定的时态模式可用于识别往往诊断不足的低流行疾病。然而,在时态医学数据中搜索这些模式具有挑战性,因为数据通常有噪声、复杂且规模庞大。在这项工作中,我们提出了一种有效且高效的解决方案,用于搜索表现出与输入查询相似病症的患者。在我们的解决方案中,我们基于最长公共子序列(LCSS)提出了一种相似性概念,该概念用于测量查询与患者时态医学数据之间的相似性,并确保对数据中的噪声具有鲁棒性。我们的解决方案采用局部敏感哈希技术来解决医学数据的高维度问题,通过将记录的临床事件(如药物和诊断代码)嵌入到紧凑签名中。为了在大型电子健康记录(EHR)数据集中进行模式搜索,我们提出了一种基于串联模式的过滤方法,该方法在丢弃无关数据的同时有效地识别候选匹配项。使用真实世界数据集进行的评估表明,我们的解决方案非常准确,同时显著加速了相似性搜索。

相似文献

1
Noise-tolerant similarity search in temporal medical data.
J Biomed Inform. 2021 Jan;113:103667. doi: 10.1016/j.jbi.2020.103667. Epub 2020 Dec 25.
2
Robust hashing with local models for approximate similarity search.
IEEE Trans Cybern. 2014 Jul;44(7):1225-36. doi: 10.1109/TCYB.2013.2289351.
3
A Segment-Based Trajectory Similarity Measure in the Urban Transportation Systems.
Sensors (Basel). 2017 Mar 6;17(3):524. doi: 10.3390/s17030524.
4
Nonlinear Asymmetric Multi-Valued Hashing.
IEEE Trans Pattern Anal Mach Intell. 2019 Nov;41(11):2660-2676. doi: 10.1109/TPAMI.2018.2867866. Epub 2018 Aug 30.
5
Semi-supervised hashing for large-scale search.
IEEE Trans Pattern Anal Mach Intell. 2012 Dec;34(12):2393-406. doi: 10.1109/TPAMI.2012.48.
6
Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing.
Bioinformatics. 2010 Apr 1;26(7):953-9. doi: 10.1093/bioinformatics/btq067. Epub 2010 Feb 23.
7
In Defense of Locality-Sensitive Hashing.
IEEE Trans Neural Netw Learn Syst. 2018 Jan;29(1):87-103. doi: 10.1109/TNNLS.2016.2615085. Epub 2016 Oct 24.
8
Asymmetric distances for binary embeddings.
IEEE Trans Pattern Anal Mach Intell. 2014 Jan;36(1):33-47. doi: 10.1109/TPAMI.2013.101.
9
Scalable Supervised Asymmetric Hashing With Semantic and Latent Factor Embedding.
IEEE Trans Image Process. 2019 Oct;28(10):4803-4818. doi: 10.1109/TIP.2019.2912290. Epub 2019 May 8.
10
Fast and accurate hashing via iterative nearest neighbors expansion.
IEEE Trans Cybern. 2014 Nov;44(11):2167-77. doi: 10.1109/TCYB.2014.2302018.

本文引用的文献

2
Measure clinical drug-drug similarity using Electronic Medical Records.
Int J Med Inform. 2019 Apr;124:97-103. doi: 10.1016/j.ijmedinf.2019.02.003. Epub 2019 Feb 11.
5
Patient ranking with temporally annotated data.
J Biomed Inform. 2018 Feb;78:43-53. doi: 10.1016/j.jbi.2017.12.007. Epub 2017 Dec 19.
6
Doctor AI: Predicting Clinical Events via Recurrent Neural Networks.
JMLR Workshop Conf Proc. 2016 Aug;56:301-318. Epub 2016 Dec 10.
7
Prevalence of Inflammatory Bowel Disease Among Adults Aged ≥18 Years - United States, 2015.
MMWR Morb Mortal Wkly Rep. 2016 Oct 28;65(42):1166-1169. doi: 10.15585/mmwr.mm6542a3.
8
MIMIC-III, a freely accessible critical care database.
Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验