Chemistry Modeling and Informatics Department, Merck Research Laboratories, Rahway, New Jersey 07065, USA.
J Chem Inf Model. 2010 Nov 22;50(11):2029-40. doi: 10.1021/ci100312t. Epub 2010 Oct 26.
One approach to estimating the "chemical tractability" of a candidate protein target where we know the atomic resolution structure is to examine the physical properties of potential binding sites. A number of other workers have addressed this issue. We characterize ~290,000 "pockets" from ~42,000 protein crystal structures in terms of a three parameter "pocket space": volume, buriedness, and hydrophobicity. A metric DLID (drug-like density) measures how likely a pocket is to bind a drug-like molecule. This is calculated from the count of other pockets in its local neighborhood in pocket space that contain drug-like cocrystallized ligands and the count of total pockets in the neighborhood. Surprisingly, despite being defined locally, a global trend in DLID can be predicted by a simple linear regression on log(volume), buriedness, and hydrophobicity. Two levels of simplification are necessary to relate the DLID of individual pockets to "targets": taking the best DLID per Protein Data Bank (PDB) entry (because any given crystal structure can have many pockets), and taking the median DLID over all PDB entries for the same target (because different crystal structures of the same protein can vary because of artifacts and real conformational changes). We can show that median DLIDs for targets that are detectably homologous in sequence are reasonably similar and that median DLIDs correlate with the "druggability" estimate of Cheng et al. (Nature Biotechnology 2007, 25, 71-75).
一种评估候选蛋白质靶标“化学可处理性”的方法是检查潜在结合位点的物理性质。许多其他研究人员已经解决了这个问题。我们根据三个参数“口袋空间”(体积、埋置度和疏水性)来描述来自约 42000 个蛋白质晶体结构的约 290000 个“口袋”。一种度量标准 DLID(药物样密度)用于衡量口袋与药物样分子结合的可能性。这是通过计算口袋空间中包含药物样共晶配体的其他口袋的数量和邻居中的总口袋数量来计算的。令人惊讶的是,尽管 DLID 是在局部定义的,但可以通过对体积、埋置度和疏水性的对数进行简单线性回归来预测其全局趋势。为了将个体口袋的 DLID 与“靶标”联系起来,需要进行两级简化:选择每个蛋白质数据库(PDB)条目最佳的 DLID(因为给定的晶体结构可以有多个口袋),以及选择同一靶标所有 PDB 条目的中位数 DLID(因为同一蛋白质的不同晶体结构可能因伪影和真实构象变化而有所不同)。我们可以证明,在序列上可检测到同源的靶标之间的中位数 DLID 相当相似,并且中位数 DLID 与 Cheng 等人的“可成药性”估计相关(自然生物技术 2007,25,71-75)。