Abdulla Shibla, Aevermann Brian, Assis Pedro, Badajoz Seve, Bell Sidney M, Bezzi Emanuele, Cakir Batuhan, Chaffer Jim, Chambers Signe, Cherry J Michael, Chi Tiffany, Chien Jennifer, Dorman Leah, Garcia-Nieto Pablo, Gloria Nayib, Hastie Mim, Hegeman Daniel, Hilton Jason, Huang Timmy, Infeld Amanda, Istrate Ana-Maria, Jelic Ivana, Katsuya Kuni, Kim Yang Joon, Liang Karen, Lin Mike, Lombardo Maximilian, Marshall Bailey, Martin Bruce, McDade Fran, Megill Colin, Patel Nikhil, Predeus Alexander, Raymor Brian, Robatmili Behnam, Rogers Dave, Rutherford Erica, Sadgat Dana, Shin Andrew, Small Corinn, Smith Trent, Sridharan Prathap, Tarashansky Alexander, Tavares Norbert, Thomas Harley, Tolopko Andrew, Urisko Meghan, Yan Joyce, Yeretssian Garabet, Zamanian Jennifer, Mani Arathi, Cool Jonah, Carr Ambrose
Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK.
Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA.
Nucleic Acids Res. 2025 Jan 6;53(D1):D886-D900. doi: 10.1093/nar/gkae1142.
Hundreds of millions of single cells have been analyzed using high-throughput transcriptomic methods. The cumulative knowledge within these datasets provides an exciting opportunity for unlocking insights into health and disease at the level of single cells. Meta-analyses that span diverse datasets building on recent advances in large language models and other machine-learning approaches pose exciting new directions to model and extract insight from single-cell data. Despite the promise of these and emerging analytical tools for analyzing large amounts of data, the sheer number of datasets, data models and accessibility remains a challenge. Here, we present CZ CELLxGENE Discover (cellxgene.cziscience.com), a data platform that provides curated and interoperable single-cell data. Available via a free-to-use online data portal, CZ CELLxGENE hosts a growing corpus of community-contributed data of over 93 million unique cells. Curated, standardized and associated with consistent cell-level metadata, this collection of single-cell transcriptomic data is the largest of its kind and growing rapidly via community contributions. A suite of tools and features enables accessibility and reusability of the data via both computational and visual interfaces to allow researchers to explore individual datasets, perform cross-corpus analysis, and run meta-analyses of tens of millions of cells across studies and tissues at the resolution of single cells.
使用高通量转录组学方法已经分析了数亿个单细胞。这些数据集中积累的知识为在单细胞水平上揭示健康和疾病的见解提供了一个令人兴奋的机会。基于大语言模型和其他机器学习方法的最新进展,跨越不同数据集的荟萃分析为从单细胞数据中建模和提取见解带来了令人兴奋的新方向。尽管这些以及新兴的分析工具在分析大量数据方面前景广阔,但数据集的数量、数据模型和可访问性仍然是一个挑战。在这里,我们展示了CZ CELLxGENE Discover(cellxgene.cziscience.com),这是一个提供经过整理且可互操作的单细胞数据的数据平台。通过免费使用的在线数据门户即可访问,CZ CELLxGENE托管着一个由社区贡献的数据组成的不断增长的语料库,其中包含超过9300万个独特的细胞。经过整理、标准化并与一致的细胞水平元数据相关联,这一单细胞转录组数据集合是同类数据中最大的,并且通过社区贡献正在迅速增长。一套工具和功能通过计算和可视化界面实现了数据的可访问性和可重用性,使研究人员能够探索单个数据集、进行跨语料库分析,并在单细胞分辨率下对跨研究和组织的数千万个细胞进行荟萃分析。