Wang Haishuai, Avillach Paul
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.
Department of Computer Science and Engineering, Fairfield University, Fairfield, CT, United States.
JMIR Med Inform. 2021 Apr 7;9(4):e24754. doi: 10.2196/24754.
In the United States, about 3 million people have autism spectrum disorder (ASD), and around 1 out of 59 children are diagnosed with ASD. People with ASD have characteristic social communication deficits and repetitive behaviors. The causes of this disorder remain unknown; however, in up to 25% of cases, a genetic cause can be identified. Detecting ASD as early as possible is desirable because early detection of ASD enables timely interventions in children with ASD. Identification of ASD based on objective pathogenic mutation screening is the major first step toward early intervention and effective treatment of affected children.
Recent investigation interrogated genomics data for detecting and treating autism disorders, in addition to the conventional clinical interview as a diagnostic test. Since deep neural networks perform better than shallow machine learning models on complex and high-dimensional data, in this study, we sought to apply deep learning to genetic data obtained across thousands of simplex families at risk for ASD to identify contributory mutations and to create an advanced diagnostic classifier for autism screening.
After preprocessing the genomics data from the Simons Simplex Collection, we extracted top ranking common variants that may be protective or pathogenic for autism based on a chi-square test. A convolutional neural network-based diagnostic classifier was then designed using the identified significant common variants to predict autism. The performance was then compared with shallow machine learning-based classifiers and randomly selected common variants.
The selected contributory common variants were significantly enriched in chromosome X while chromosome Y was also discriminatory in determining the identification of autistic individuals from nonautistic individuals. The ARSD, MAGEB16, and MXRA5 genes had the largest effect in the contributory variants. Thus, screening algorithms were adapted to include these common variants. The deep learning model yielded an area under the receiver operating characteristic curve of 0.955 and an accuracy of 88% for identifying autistic individuals from nonautistic individuals. Our classifier demonstrated a considerable improvement of ~13% in terms of classification accuracy compared to standard autism screening tools.
Common variants are informative for autism identification. Our findings also suggest that the deep learning process is a reliable method for distinguishing the diseased group from the control group based on the common variants of autism.
在美国,约有300万人患有自闭症谱系障碍(ASD),每59名儿童中约有1人被诊断为ASD。ASD患者具有典型的社交沟通缺陷和重复行为。这种疾病的病因尚不清楚;然而,在高达25%的病例中,可以确定遗传原因。尽早检测出ASD是很有必要的,因为早期发现ASD能够及时对ASD儿童进行干预。基于客观致病突变筛查来识别ASD是对受影响儿童进行早期干预和有效治疗的主要第一步。
最近的研究除了将传统的临床访谈作为诊断测试外,还对基因组学数据进行了研究,以检测和治疗自闭症谱系障碍。由于深度神经网络在复杂的高维数据上比浅层机器学习模型表现更好,在本研究中,我们试图将深度学习应用于从数千个有ASD风险的单基因家庭获得的遗传数据,以识别促成突变,并创建一个用于自闭症筛查的先进诊断分类器。
在对来自西蒙斯单基因队列研究的基因组数据进行预处理后,我们基于卡方检验提取了可能对自闭症具有保护或致病作用的排名靠前的常见变异。然后使用识别出的显著常见变异设计了一个基于卷积神经网络的诊断分类器来预测自闭症。随后将其性能与基于浅层机器学习的分类器以及随机选择的常见变异进行比较。
所选的促成常见变异在X染色体上显著富集,而Y染色体在区分自闭症个体和非自闭症个体方面也具有判别性。ARSD、MAGEB16和MXRA5基因在促成变异中影响最大。因此,筛选算法进行了调整以纳入这些常见变异。深度学习模型在从非自闭症个体中识别自闭症个体时,受试者工作特征曲线下面积为0.955,准确率为88%。与标准自闭症筛查工具相比,我们的分类器在分类准确率方面有相当大的提高,约为13%。
常见变异对自闭症的识别具有参考价值。我们的研究结果还表明,深度学习过程是一种基于自闭症常见变异将患病组与对照组区分开来的可靠方法。