Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Durham, NC, USA.
Oak Ridge Institute for Science and Education (ORISE), Oak Ridge, TN, USA.
SLAS Discov. 2021 Feb;26(2):292-308. doi: 10.1177/2472555220950245. Epub 2020 Aug 29.
Phenotypic profiling assays are untargeted screening assays that measure a large number (hundreds to thousands) of cellular features in response to a stimulus and often yield diverse and unanticipated profiles of phenotypic effects, leading to challenges in distinguishing active from inactive treatments. Here, we compare a variety of different strategies for hit identification in imaging-based phenotypic profiling assays using a previously published Cell Painting data set. Hit identification strategies based on multiconcentration analysis involve curve fitting at several levels of data aggregation (e.g., individual feature level, aggregation of similarly derived features into categories, and global modeling of all features) and on computed metrics (e.g., Euclidean and Mahalanobis distance metrics and eigenfeatures). Hit identification strategies based on single-concentration analysis included measurement of signal strength (e.g., total effect magnitude) and correlation of profiles among biological replicates. Modeling parameters for each approach were optimized to retain the ability to detect a reference chemical with subtle phenotypic effects while limiting the false-positive rate to 10%. The percentage of test chemicals identified as hits was highest for feature-level and category-based approaches, followed by global fitting, whereas signal strength and profile correlation approaches detected the fewest number of active hits at the fixed false-positive rate. Approaches involving fitting of distance metrics had the lowest likelihood for identifying high-potency false-positive hits that may be associated with assay noise. Most of the methods achieved a 100% hit rate for the reference chemical and high concordance for 82% of test chemicals, indicating that hit calls are robust across different analysis approaches.
表型分析是一种非靶向筛选试验,它可以测量大量(数百到数千)细胞特征对刺激的反应,并且通常会产生多样化且意外的表型效应谱,从而导致难以区分有效和无效的处理方法。在这里,我们使用以前发表的细胞成像数据集比较了各种不同的基于成像的表型分析中的命中鉴定策略。基于多浓度分析的命中鉴定策略涉及在几个数据聚合水平(例如,单个特征水平、类似衍生特征的分类聚合以及所有特征的全局建模)以及计算指标(例如,欧几里得和马氏距离指标和特征)进行曲线拟合。基于单浓度分析的命中鉴定策略包括测量信号强度(例如,总效应幅度)和生物复制之间的谱相关性。为了保持检测具有细微表型效应的参考化学物质的能力,同时将假阳性率限制在 10%,对每种方法的建模参数进行了优化。在固定的假阳性率下,以特征水平和基于类别为基础的方法确定的测试化学物质中,有最高比例被鉴定为命中,其次是全局拟合,而信号强度和谱相关性方法则检测到最少数量的活性命中。涉及距离度量拟合的方法最不可能识别与试验噪声相关的高潜力假阳性命中。大多数方法对参考化学物质的命中率达到 100%,对 82%的测试化学物质的一致性很高,这表明命中调用在不同的分析方法中是稳健的。