Suppr超能文献

蒙特卡罗模拟在生化过程建模中的应用

Applications of Monte Carlo Simulation in Modelling of Biochemical Processes

作者信息

Tenekedjiev Kiril Ivanov, Nikolova Natalia Danailova, Kolev Krasimir

机构信息

N. Y. Vaptsarov Naval Academy, Varna, Bulgaria

Semmelweis University – Budapest, Hungary

Abstract

The biochemical models describing complex and dynamic metabolic systems are typically multi-parametric and non-linear, thus the identification of their parameters requires non-linear regression analysis of the experimental data. The stochastic nature of the experimental samples poses the necessity to estimate not only the values fitting best to the model, but also the distribution of the parameters, and to test statistical hypotheses about the values of these parameters. In such situations the application of analytical models for parameter distributions is totally inappropriate because their assumptions are not applicable for intrinsically non-linear regressions. That is why, Monte Carlo simulations are a powerful tool to model biochemical processes. The classification of Monte Carlo approaches is not unified, so here we comply with the interpretation given in (Press et al., 1992), where the general Monte Carlo approach is to construct parallel virtual worlds, in which the experimental estimates will play the role of true parameters, if the way in which the true parameters generate a sample is known. Bootstrap is a modification of Monte Carlo, which uses very few premises imposed on the data, and does not need to know the mechanism by which the true parameters generate experimental samples. Instead, resampling with replacement from the experimental sample is used to construct synthetic samples. As far as confidence intervals (CI) are concerned, literature offers multiple types, but each of them belongs to one of the two main groups: root (Politis, 1998) and percentile intervals (Efron & Tibshirani, 1993). The difference in the philosophy of those two CI types is substantial for the biochemical interpretation of results. The difference here is explained with the difference between classical statistics (where the parameters are fixed unknown quantities) and Bayesian statistics (where the parameters are random variables with unknown distributions), and also with the philosophical differences between objectivity and subjectivity of scientific research. The main conclusion is that root confidence intervals are confidence intervals of the investigated parameters, whereas percentile confidence intervals refer to the estimates of the investigated parameters. Our first application of Monte Carlo and Bootstrap simulation procedures is with a simulation platform for training students in medical biochemistry (Tenekedjiev & Kolev, 2002). In this system, students search for estimates and confidence intervals of parameters of a given biochemical system for different enzyme-substrate pairs. The platform applies Monte Carlo simulation on two stages. Initially, a Monte Carlo procedure is applied to emulate a biochemical experimental measurement setting along with given enzyme kinetic reactions as realistically as possible. The system is in position to simulate continuous enzyme assay (used for adjustment of the “experimental” conditions) and end-point enzyme assay “measurements” (suitable for parameter identification). We use an ordinary differential equation (ODE) as basis of the generation of pseudo-experimental data. The pseudo-real nature of the generated data is ensured by the random incorporation of three types of errors for each repetition of the experiments. The Briggs-Haldane steady-state model is fitted to the pseudo-measured and end-point assay data obtained by the system. The kinetic parameters can be calculated by χ-minimization. The task is simplified by the existence of a good initial guess from a linearized Lineweaver-Burk model. The two-dimensional root confidence regions of the parameters can be calculated by either Monte Carlo or Bootstrap, following similar procedures. The best point estimate is identified using trimmed mean over the flipped parameters taking only the values from the identified root confidence region. In the majority of biochemical reactions, parameters are unknown in very wide intervals, and may have different numerical order. Finding the root confidence regions (intervals) includes parameter flipping, which often generates results with an incorrect sign. That is why, in a second example (Tanka-Salamon et al., 2008) we propose a multiplicative modification for the estimation of root confidence regions and the best estimate of the parameters, which ensures that all estimates will have a physical meaning. The main assumption is that the ratio between the true parameter value and the optimal parameter value derived from the true data sample has the same distribution as the ratio between the optimal parameter value derived from the true data sample, and the optimal synthetic parameter value derived from the synthetic data sample. The assumption is equivalent to performing classical Bootstrap over the logarithms of the estimated parameters. This method is applied in a real experimental set-up for the estimation of root confidence regions of kinetic constants and root best estimates in amidolytic activity of plasmin under the influence of three fatty acids. By doing so, the inhibition effect of the three fatty acids can be proven and quantified. The measured data have the form of continuous reaction progress curves with several replicas. The product concentrations are predicted by three different models with increasing complexity. We model the instability of the inhibited enzyme and represent the resulting continuous assay model with concomitant inactivation of the enzyme as a system of two stiff ODE. From there, we derive the closed form of the progress curve. The four-dimensional root confidence regions are acquired by Monte Carlo simulation in every data point in each of the progress curves using an analytical model of the measured standard deviation, similarly to the first example.

摘要

描述复杂且动态代谢系统的生化模型通常是多参数且非线性的,因此其参数的识别需要对实验数据进行非线性回归分析。实验样本的随机性使得不仅要估计最适合模型的值,还要估计参数的分布,并检验关于这些参数值的统计假设。在这种情况下,应用参数分布的分析模型是完全不合适的,因为它们的假设不适用于本质上的非线性回归。这就是为什么蒙特卡罗模拟是模拟生化过程的有力工具。蒙特卡罗方法的分类并不统一,所以在这里我们遵循(Press等人,1992年)给出的解释,其中一般的蒙特卡罗方法是构建平行的虚拟世界,如果已知真实参数生成样本的方式,那么实验估计值将在其中扮演真实参数的角色。自助法是蒙特卡罗的一种改进,它对数据施加的前提条件很少,并且不需要知道真实参数生成实验样本的机制。相反,从实验样本中有放回地重采样用于构建合成样本。就置信区间(CI)而言,文献中提供了多种类型,但它们都属于两个主要组之一:根置信区间(Politis,1998年)和百分位数区间(Efron & Tibshirani,1993年)。这两种置信区间类型在哲学上的差异对于结果的生化解释至关重要。这里的差异用经典统计学(其中参数是固定的未知量)和贝叶斯统计学(其中参数是具有未知分布的随机变量)之间的差异来解释,也用科学研究的客观性和主观性之间的哲学差异来解释。主要结论是,根置信区间是所研究参数的置信区间,而百分位数置信区间指的是所研究参数的估计值。我们对蒙特卡罗和自助法模拟程序的首次应用是在一个用于医学生物化学教学的模拟平台上(Tenekedjiev & Kolev,2002年)。在这个系统中,学生针对不同的酶 - 底物对搜索给定生化系统参数的估计值和置信区间。该平台在两个阶段应用蒙特卡罗模拟。最初,应用一个蒙特卡罗程序尽可能逼真地模拟生化实验测量设置以及给定的酶动力学反应。该系统能够模拟连续酶测定(用于调整“实验”条件)和终点酶测定“测量”(适用于参数识别)。我们使用常微分方程(ODE)作为生成伪实验数据的基础。通过在每次实验重复中随机引入三种类型的误差来确保生成数据具有伪真实性质。将布里格斯 - 霍尔丹稳态模型拟合到系统获得的伪测量数据和终点测定数据上。动力学参数可以通过χ最小化来计算。由于线性化的林韦弗 - 伯克模型有一个良好的初始猜测,任务得以简化。参数的二维根置信区域可以通过蒙特卡罗或自助法按照类似的程序来计算。通过对仅从已识别的根置信区域取值的翻转参数取截尾均值来确定最佳点估计值。在大多数生化反应中,参数在非常宽的区间内是未知的,并且可能具有不同的数值顺序。找到根置信区域(区间)包括参数翻转,这常常会产生符号错误的结果。这就是为什么,在第二个例子中(Tanka - Salamon等人,2008年),我们提出一种乘法修正方法来估计根置信区域和参数的最佳估计值,这确保了所有估计值都具有物理意义。主要假设是真实参数值与从真实数据样本导出的最优参数值之比,与从真实数据样本导出的最优参数值和从合成数据样本导出的最优合成参数值之比具有相同的分布。该假设等同于对估计参数取对数后进行经典自助法。此方法应用于一个实际实验设置中,用于估计在三种脂肪酸影响下纤溶酶酰胺分解活性的动力学常数的根置信区域和根最佳估计值。通过这样做,可以证明并量化三种脂肪酸的抑制作用。测量数据具有带有多个复制品的连续反应进程曲线的形式。产物浓度由三个复杂度不断增加的不同模型来预测。我们对受抑制酶的不稳定性进行建模,并将所得的伴有酶失活的连续测定模型表示为一个双刚性常微分方程组。从那里,我们推导出进程曲线的封闭形式。与第一个例子类似,通过使用测量标准差的分析模型,在每个进程曲线的每个数据点通过蒙特卡罗模拟获得四维根置信区域。

相似文献

4
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
6
Monte Carlo Simulations for the Analysis of Non-linear Parameter Confidence Intervals in Optimal Experimental Design.
Front Bioeng Biotechnol. 2019 May 24;7:122. doi: 10.3389/fbioe.2019.00122. eCollection 2019.
8
The effects of active metabolites on parameter estimation in linear mixed effect models of concentration-QT analyses.
J Pharmacokinet Pharmacodyn. 2013 Feb;40(1):101-15. doi: 10.1007/s10928-012-9292-y. Epub 2013 Jan 4.
9
Accelerated maximum likelihood parameter estimation for stochastic biochemical systems.
BMC Bioinformatics. 2012 May 1;13:68. doi: 10.1186/1471-2105-13-68.
10
Small class sizes for improving student achievement in primary and secondary schools: a systematic review.
Campbell Syst Rev. 2018 Oct 11;14(1):1-107. doi: 10.4073/csr.2018.10. eCollection 2018.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验