Suppr超能文献

使用基于长读长RNA测序数据的综合注释来鉴定细胞类型特异性的、转录活跃的转座元件。

Identification of cell-type-specific, transcriptionally active transposable elements using long-read RNA-sequencing data-based comprehensive annotation.

作者信息

Lim Chaemin, An Hyunsu, Park Jihwan

机构信息

School of Life Sciences, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005, Republic of Korea.

出版信息

Genomics Inform. 2025 Aug 6;23(1):17. doi: 10.1186/s44342-025-00048-1.

Abstract

BACKGROUND

The biological functions of transposable element (TE)-derived transcripts during physiological development, disease development, and progression have been previously reported. However, research on locus-specific TE-derived transcript expression in various human cell types remains limited.

METHODS

We processed 2596 publicly available human long-read RNA-sequencing (LR RNA-seq) datasets covering 21 organs and 71 cell lines in both healthy individuals and diseased patients with various conditions to compile this TE-derived transcript annotation. We established a pipeline for assembling transcripts containing TE sequences to measure transcriptionally active TE-derived transcripts in diverse tissues and cell types. Next, we applied our TE annotation to the Genotype-Tissue Expression (GTEx) single-cell RNA-sequencing (scRNA-seq) data from eight tissues.

RESULTS

We constructed the first transcriptom6e-based TE annotation using massive amounts of human LR RNA-seq data for use as a comprehensive reference to detect locus-specific TE-derived transcripts. Our annotation showed better detection accuracy for TE-derived transcripts than the RepeatMasker and GENCODE nonTE gene annotations. This annotation enabled the identification of novel TE-derived transcripts and their isoforms. We also identified alternative transcription end sites for long noncoding genes and confirmed previously annotated TE-nonTE gene fusion transcripts. Next, we applied our TE-derived transcript annotation to public scRNA-seq data from various human tissues and identified several cell-type-specific TE-derived transcripts in a locus-specific manner.

CONCLUSIONS

We generated a comprehensive, TE-derived transcript annotation using large-scale, LR RNA-seq data. Researchers can use our TE reference annotation to analyze active TE transcripts and their splicing isoforms in specific transcriptome datasets and to detect de novo TE transcripts. The discovery of cell-type-specific TE-derived transcripts may help explain mechanisms underlying the maintenance of cellular identity and provide new insights into the pathological mechanisms of various diseases.

摘要

背景

转座元件(TE)衍生转录本在生理发育、疾病发生和进展过程中的生物学功能此前已有报道。然而,关于不同人类细胞类型中位点特异性TE衍生转录本表达的研究仍然有限。

方法

我们处理了2596个公开可用的人类长读长RNA测序(LR RNA-seq)数据集,这些数据集涵盖了健康个体和患有各种疾病的患者的21个器官和71种细胞系,以编制此TE衍生转录本注释。我们建立了一个用于组装包含TE序列的转录本的流程,以测量不同组织和细胞类型中转录活跃的TE衍生转录本。接下来,我们将我们的TE注释应用于来自八个组织的基因型-组织表达(GTEx)单细胞RNA测序(scRNA-seq)数据。

结果

我们利用大量人类LR RNA-seq数据构建了首个基于转录组的TE注释,用作检测位点特异性TE衍生转录本的综合参考。我们的注释在检测TE衍生转录本方面比RepeatMasker和GENCODE非TE基因注释具有更高的准确性。该注释能够识别新的TE衍生转录本及其异构体。我们还确定了长链非编码基因的替代转录末端位点,并证实了先前注释的TE-非TE基因融合转录本。接下来,我们将我们的TE衍生转录本注释应用于来自各种人类组织的公共scRNA-seq数据,并以位点特异性方式鉴定了几种细胞类型特异性TE衍生转录本。

结论

我们利用大规模LR RNA-seq数据生成了全面的TE衍生转录本注释。研究人员可以使用我们的TE参考注释来分析特定转录组数据集中活跃的TE转录本及其剪接异构体,并检测新生TE转录本。细胞类型特异性TE衍生转录本的发现可能有助于解释维持细胞身份的机制,并为各种疾病的病理机制提供新的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/25ff/12326599/4965b372f86e/44342_2025_48_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验