Global DMPK, Takeda California Inc., San Diego, California 92121, United States.
Mol Pharm. 2020 Jul 6;17(7):2299-2309. doi: 10.1021/acs.molpharmaceut.9b01294. Epub 2020 Jun 12.
The in vitro-in vivo extrapolation (IVIVE) approach for predicting total plasma clearance (CL) has been widely used to rank order compounds early in discovery. More recently, a computational machine learning approach utilizing physicochemical descriptors and fingerprints calculated from chemical structure information has emerged, enabling virtual predictions even earlier in discovery. Previously, this approach focused more on in vitro intrinsic clearance (CL) prediction. Herein, we directly compare these two approaches for predicting CL in rats. A structurally diverse set of 1114 compounds with known in vivo CL, in vitro CL, and plasma protein binding was used as the basis for this evaluation. The machine learning models were assessed by validation approaches using the time- and cluster-split training and test sets, and five-fold cross validation. Assessed by five-fold validation, the random forest regression (RF) and radial basis function (RBF) models demonstrated better prediction performance in eight attempted machine learning models. The CL values predicted by the RF and RBF models were within two-fold of the observed values for 67.7 and 71.9% of cluster-split test set compounds, respectively, while the predictivity was worse in the time-split dataset. The predictivity of both models tended to be improved by incorporating in vitro parameters, unbound fraction in plasma (), and CL. CL prediction utilizing in vitro CL and the well-stirred model, correcting for the fraction unbound in blood, was substantially worse compared to machine learning approaches for the same cluster-split test set. The reason that CL is underestimated by IVIVE is not fully explained by considering the calculated microsomal unbound fraction (cf), extended clearance classification system (ECCS), and omitting high clearance compounds in excess of hepatic blood flow. The analysis suggests that in silico machine learning models may have the power to reduce reliance on or replace in vitro and in vivo studies for chemical structure optimization in early drug discovery.
体外-体内外推 (IVIVE) 方法广泛用于预测总血浆清除率 (CL),以对早期发现的化合物进行排序。最近,出现了一种利用理化性质描述符和从化学结构信息计算得出的指纹的计算机器学习方法,使得虚拟预测甚至可以更早地进行。此前,该方法更侧重于预测体外固有清除率 (CL)。在此,我们直接比较这两种方法在大鼠中预测 CL 的效果。使用具有已知体内 CL、体外 CL 和血浆蛋白结合的结构多样的 1114 种化合物作为评估基础。通过使用时间和聚类分裂训练和测试集以及五倍交叉验证的验证方法评估机器学习模型。通过五倍验证,随机森林回归 (RF) 和径向基函数 (RBF) 模型在尝试的 8 个机器学习模型中表现出更好的预测性能。RF 和 RBF 模型预测的 CL 值分别为聚类分裂测试集化合物的 67.7%和 71.9%在观察值的两倍以内,而时间分裂数据集的预测性较差。这两种模型的预测性能都通过纳入体外参数、未结合血浆分数 () 和 CL 得到改善。与机器学习方法相比,利用体外 CL 和完全搅拌模型(校正血液中未结合分数)预测 CL 的效果要差得多对于相同的聚类分裂测试集。CL 被 IVIVE 低估的原因,通过考虑计算得到的微粒体未结合分数 (cf)、扩展清除分类系统 (ECCS) 以及排除超过肝血流量的高清除化合物,并未得到充分解释。该分析表明,计算机器学习模型可能具有减少对早期药物发现中化学结构优化的体外和体内研究的依赖或替代的能力。