Liu Yidi, Yang Qi, Cheng Junjie, Zhang Long, Luo Sanzhong, Cheng Jin-Pei
Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing, 100084, China.
Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 3, 00192, China.
Chemphyschem. 2023 Jul 17;24(14):e202300162. doi: 10.1002/cphc.202300162. Epub 2023 Jun 2.
Nucleophilicity and electrophilicity dictate the reactivity of polar organic reactions. In the past decades, Mayr et al. established a quantitative scale for nucleophilicity (N) and electrophilicity (E), which proved to be a useful tool for the rationalization of chemical reactivity. In this study, a holistic prediction model was developed through a machine-learning approach. rSPOC, an ensemble molecular representation with structural, physicochemical and solvent features, was developed for this purpose. With 1115 nucleophiles, 285 electrophiles, and 22 solvents, the dataset is currently the largest one for reactivity prediction. The rSPOC model trained with the Extra Trees algorithm showed high accuracy in predicting Mayr's N and E parameters with R of 0.92 and 0.93, MAE of 1.45 and 1.45, respectively. Furthermore, the practical applications of the model, for instance, nucleophilicity prediction of NADH, NADPH and a series of enamines showed potential in predicting molecules with unknown reactivity within seconds. An online prediction platform (http://isyn.luoszgroup.com/) was constructed based on the current model, which is available free to the scientific community.
亲核性和亲电性决定了极性有机反应的反应活性。在过去几十年中,迈尔等人建立了亲核性(N)和亲电性(E)的定量标度,这被证明是用于合理解释化学反应活性的有用工具。在本研究中,通过机器学习方法开发了一个整体预测模型。为此开发了rSPOC,一种具有结构、物理化学和溶剂特征的集成分子表示法。该数据集包含1115个亲核试剂、285个亲电试剂和22种溶剂,是目前用于反应活性预测的最大数据集。使用Extra Trees算法训练的rSPOC模型在预测迈尔的N和E参数时显示出高精度,R分别为0.92和0.93,MAE分别为1.45和1.45。此外,该模型的实际应用,例如对NADH、NADPH和一系列烯胺的亲核性预测,显示出在几秒钟内预测具有未知反应活性分子的潜力。基于当前模型构建了一个在线预测平台(http://isyn.luoszgroup.com/),科学界可免费使用。