Harun Rashed, Yang Eric, Kassir Nastya, Zhang Wenhui, Lu James
Genentech Inc., South San Francisco, CA 94080, USA.
Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA.
Pharmaceutics. 2023 Apr 30;15(5):1381. doi: 10.3390/pharmaceutics15051381.
Exposure-response (E-R) is a key aspect of pharmacometrics analysis that supports drug dose selection. Currently, there is a lack of understanding of the technical considerations necessary for drawing unbiased estimates from data. Due to recent advances in machine learning (ML) explainability methods, ML has garnered significant interest for causal inference. To this end, we used simulated datasets with known E-R "ground truth" to generate a set of good practices for the development of ML models required to avoid introducing biases when performing causal inference. These practices include the use of causal diagrams to enable the careful consideration of model variables by which to obtain desired E-R relationship insights, keeping a strict separation of data for model-training and for inference generation to avoid biases, hyperparameter tuning to improve the reliability of models, and estimating proper confidence intervals around inferences using a bootstrap sampling with replacement strategy. We computationally confirm the benefits of the proposed ML workflow by using a simulated dataset with nonlinear and non-monotonic exposure-response relationships.
暴露-反应(E-R)是支持药物剂量选择的药代动力学分析的一个关键方面。目前,对于从数据中得出无偏估计所需的技术考量缺乏理解。由于机器学习(ML)可解释性方法的最新进展,ML在因果推断方面引起了极大兴趣。为此,我们使用具有已知E-R“基本事实”的模拟数据集,为开发ML模型生成了一套良好实践方法,以便在进行因果推断时避免引入偏差。这些实践方法包括使用因果图,以便仔细考虑模型变量,从而获得所需的E-R关系见解;严格分离用于模型训练和推理生成的数据,以避免偏差;进行超参数调整以提高模型的可靠性;以及使用有放回的自助抽样策略估计推理周围的适当置信区间。我们通过使用具有非线性和非单调暴露-反应关系的模拟数据集,从计算上证实了所提出的ML工作流程的益处。