Wang Lin, Li Xiaozhong, Zhang Louxin, Gao Qiang
School of Computer Science and Information Engineering, Tianjin University of Science and Technology, Tianjin, 300457, China.
Department of Mathematics, National University of Singapore, Singapore, 119076, Singapore.
BMC Cancer. 2017 Aug 2;17(1):513. doi: 10.1186/s12885-017-3500-5.
Human cancer cell lines are used in research to study the biology of cancer and to test cancer treatments. Recently there are already some large panels of several hundred human cancer cell lines which are characterized with genomic and pharmacological data. The ability to predict drug responses using these pharmacogenomics data can facilitate the development of precision cancer medicines. Although several methods have been developed to address the drug response prediction, there are many challenges in obtaining accurate prediction.
Based on the fact that similar cell lines and similar drugs exhibit similar drug responses, we adopted a similarity-regularized matrix factorization (SRMF) method to predict anticancer drug responses of cell lines using chemical structures of drugs and baseline gene expression levels in cell lines. Specifically, chemical structural similarity of drugs and gene expression profile similarity of cell lines were considered as regularization terms, which were incorporated to the drug response matrix factorization model.
We first demonstrated the effectiveness of SRMF using a set of simulation data and compared it with two typical similarity-based methods. Furthermore, we applied it to the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets, and performance of SRMF exceeds three state-of-the-art methods. We also applied SRMF to estimate the missing drug response values in the GDSC dataset. Even though SRMF does not specifically model mutation information, it could correctly predict drug-cancer gene associations that are consistent with existing data, and identify novel drug-cancer gene associations that are not found in existing data as well. SRMF can also aid in drug repositioning. The newly predicted drug responses of GDSC dataset suggest that mTOR inhibitor rapamycin was sensitive to non-small cell lung cancer (NSCLC), and expression of AK1RC3 and HINT1 may be adjunct markers of cell line sensitivity to rapamycin.
Our analysis showed that the proposed data integration method is able to improve the accuracy of prediction of anticancer drug responses in cell lines, and can identify consistent and novel drug-cancer gene associations compared to existing data as well as aid in drug repositioning.
人类癌细胞系被用于癌症生物学研究及癌症治疗测试。最近,已经有一些包含数百个人类癌细胞系的大型数据集,这些数据集具有基因组和药理学数据特征。利用这些药物基因组学数据预测药物反应的能力有助于精准癌症药物的开发。尽管已经开发了多种方法来解决药物反应预测问题,但在获得准确预测方面仍存在许多挑战。
基于相似的细胞系和相似的药物表现出相似的药物反应这一事实,我们采用了一种相似性正则化矩阵分解(SRMF)方法,利用药物的化学结构和细胞系中的基线基因表达水平来预测细胞系的抗癌药物反应。具体而言,药物的化学结构相似性和细胞系的基因表达谱相似性被视为正则化项,并被纳入药物反应矩阵分解模型。
我们首先使用一组模拟数据证明了SRMF的有效性,并将其与两种典型的基于相似性的方法进行了比较。此外,我们将其应用于癌症药物敏感性基因组学(GDSC)和癌细胞系百科全书(CCLE)数据集,SRMF的性能超过了三种先进方法。我们还应用SRMF来估计GDSC数据集中缺失的药物反应值。尽管SRMF没有专门对突变信息进行建模,但它能够正确预测与现有数据一致的药物 - 癌症基因关联,并且还能识别现有数据中未发现的新型药物 - 癌症基因关联。SRMF还可以辅助药物重新定位。GDSC数据集新预测的药物反应表明,mTOR抑制剂雷帕霉素对非小细胞肺癌(NSCLC)敏感,AK1RC3和HINT1的表达可能是细胞系对雷帕霉素敏感性的辅助标志物。
我们的分析表明,所提出的数据整合方法能够提高细胞系中抗癌药物反应预测的准确性,与现有数据相比,能够识别一致的和新型的药物 - 癌症基因关联,并且有助于药物重新定位。