Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005, Paris, France.
Université Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000, Grenoble, France.
Sci Data. 2024 Jul 10;11(1):752. doi: 10.1038/s41597-024-03524-5.
Proteins play a central role in biological processes, and understanding their conformational variability is crucial for unraveling their functional mechanisms. Recent advancements in high-throughput technologies have enhanced our knowledge of protein structures, yet predicting their multiple conformational states and motions remains challenging. This study introduces Dimensionality Analysis for protein Conformational Exploration (DANCE) for a systematic and comprehensive description of protein families conformational variability. DANCE accommodates both experimental and predicted structures. It is suitable for analysing anything from single proteins to superfamilies. Employing it, we clustered all experimentally resolved protein structures available in the Protein Data Bank into conformational collections and characterized them as sets of linear motions. The resource facilitates access and exploitation of the multiple states adopted by a protein and its homologs. Beyond descriptive analysis, we assessed classical dimensionality reduction techniques for sampling unseen states on a representative benchmark. This work improves our understanding of how proteins deform to perform their functions and opens ways to a standardised evaluation of methods designed to sample and generate protein conformations.
蛋白质在生物过程中起着核心作用,了解它们的构象可变性对于揭示其功能机制至关重要。高通量技术的最新进展增强了我们对蛋白质结构的认识,但预测其多种构象状态和运动仍然具有挑战性。本研究提出了用于蛋白质构象探索的维度分析(Dimensionality Analysis for protein Conformational Exploration,DANCE),以系统全面地描述蛋白质家族的构象可变性。DANCE 既适用于实验结构,也适用于预测结构。它适用于分析从单个蛋白质到超家族的任何内容。使用 DANCE,我们将蛋白质数据库中所有已解析的实验蛋白质结构聚类到构象集合中,并将它们描述为线性运动的集合。该资源促进了对蛋白质及其同源物所采用的多种状态的访问和利用。除了描述性分析之外,我们还评估了经典的降维技术,以在代表性基准上对未见状态进行采样。这项工作增进了我们对蛋白质如何变形以执行其功能的理解,并为设计用于采样和生成蛋白质构象的方法的标准化评估开辟了道路。