Suppr超能文献

从生物和计算机器中学习:SARS-CoV-2 基因组监测、突变和风险分层的重要性。

Learning From Biological and Computational Machines: Importance of SARS-CoV-2 Genomic Surveillance, Mutations and Risk Stratification.

机构信息

INtegrative GENomics of HOst-PathogEn (INGEN-HOPE) Laboratory, CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), New Delhi, India.

Birla Institute of Technology and Science, Pilani, India.

出版信息

Front Cell Infect Microbiol. 2021 Dec 24;11:783961. doi: 10.3389/fcimb.2021.783961. eCollection 2021.

Abstract

The global coronavirus disease 2019 (COVID-19) pandemic has demonstrated the range of disease severity and pathogen genomic diversity emanating from a singular virus (severe acute respiratory syndrome coronavirus 2, SARS-CoV-2). This diversity in disease manifestations and genomic mutations has challenged healthcare management and resource allocation during the pandemic, especially for countries such as India with a bigger population base. Here, we undertake a combinatorial approach toward scrutinizing the diagnostic and genomic diversity to extract meaningful information from the chaos of COVID-19 in the Indian context. Using methods of statistical correlation, machine learning (ML), and genomic sequencing on a clinically comprehensive patient dataset with corresponding with/without respiratory support samples, we highlight specific significant diagnostic parameters and ML models for assessing the risk of developing severe COVID-19. This information is further contextualized in the backdrop of SARS-CoV-2 genomic features in the cohort for pathogen genomic evolution monitoring. Analysis of the patient demographic features and symptoms revealed that age, breathlessness, and cough were significantly associated with severe disease; at the same time, we found no severe patient reporting absence of physical symptoms. Observing the trends in biochemical/biophysical diagnostic parameters, we noted that the respiratory rate, total leukocyte count (TLC), blood urea levels, and C-reactive protein (CRP) levels were directly correlated with the probability of developing severe disease. Out of five different ML algorithms tested to predict patient severity, the multi-layer perceptron-based model performed the best, with a receiver operating characteristic (ROC) score of 0.96 and an F1 score of 0.791. The SARS-CoV-2 genomic analysis highlighted a set of mutations with global frequency flips and future inculcation into variants of concern (VOCs) and variants of interest (VOIs), which can be further monitored and annotated for functional significance. In summary, our findings highlight the importance of SARS-CoV-2 genomic surveillance and statistical analysis of clinical data to develop a risk assessment ML model.

摘要

全球 2019 年冠状病毒病(COVID-19)大流行表明,源自单一病毒(严重急性呼吸综合征冠状病毒 2,SARS-CoV-2)的疾病严重程度和病原体基因组多样性。这种疾病表现和基因组突变的多样性在大流行期间对医疗保健管理和资源分配提出了挑战,尤其是对于印度这样人口基数更大的国家。在这里,我们采用组合方法研究诊断和基因组多样性,从 COVID-19 在印度的混乱中提取有意义的信息。我们使用统计相关性、机器学习(ML)和基因组测序方法,对具有相应有/无呼吸支持样本的临床综合患者数据集进行分析,突出了评估严重 COVID-19 风险的特定重要诊断参数和 ML 模型。在此基础上,我们进一步结合队列中 SARS-CoV-2 基因组特征,对病原体基因组进化进行监测。对患者人口统计学特征和症状的分析表明,年龄、呼吸困难和咳嗽与严重疾病显著相关;同时,我们发现没有严重疾病患者报告没有身体症状。观察生化/生物物理诊断参数的趋势,我们注意到呼吸频率、总白细胞计数(TLC)、血尿素水平和 C 反应蛋白(CRP)水平与发生严重疾病的概率直接相关。在测试的五种不同 ML 算法中,基于多层感知器的模型表现最好,接收者操作特征(ROC)评分为 0.96,F1 得分为 0.791。对 SARS-CoV-2 基因组的分析突出了一组具有全球频率翻转的突变,并可能进一步纳入关注变异株(VOCs)和感兴趣变异株(VOIs),可以进一步监测和注释这些突变以确定其功能意义。总之,我们的研究结果强调了 SARS-CoV-2 基因组监测和临床数据分析的重要性,以便开发风险评估 ML 模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ecc/8762993/c166a07a365c/fcimb-11-783961-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验