LRT: Integrative Analysis of Scrna-seq and Sctcr-seq Data to Investigate Clonal Differentiation Heterogeneity
Juan Xie, Hyeongseon Jeon, Gang Xin, Qin Ma, Dongjun Chung
Abstract
Single-cell RNA sequencing (scRNA-seq) data has been widely used for cell trajectory inference, with the assumption that cells with similar expression profiles share the same differentiation state. However, the inferred trajectory may not reveal clonal differentiation heterogeneity among T cell clones. Single-cell T cell receptor sequencing (scTCR-seq) data provides invaluable insights into the clonal relationship among cells, yet it lacks functional characteristics. Therefore, scRNA-seq and scTCR-seq data complement each other in improving trajectory inference, where a reliable computational tool is still missing. We developed LRT, a computational framework for the integrative analysis of scTCR-seq and scRNA-seq data to explore clonal differentiation trajectory heterogeneity. Specifically, LRT uses the transcriptomics information from scRNA-seq data to construct overall cell trajectories and then utilizes both the TCR sequence information and phenotype information to identify clonotype clusters with distinct differentiation biasedness.
Introduction
Cell trajectory inference aims to understand how cells differentiate, which is a long-standing problem in developmental biology. Understanding how progenitor cells transform into specified functional cells can provide valuable insights into the molecular mechanisms underlying normal tissue formulation, as well as developmental disorders and pathologies [1]. Single-cell RNA-seq (scRNA-seq) data has been widely used to investigate cell trajectories. Assuming that cells in different states express different sets of marker genes, we may order cells along a differentiation trajectory via capturing differentiated transcriptional activities [2]. Based on this rationale, various computational and statistical approaches have been proposed for this purpose, namely trajectory/pseudotime analysis, where well known examples include Monocle [2] and Slingshot [3] (please see Saelens et al. [4] for a comprehensive review). While these approaches have been shown to be useful, they still suffer from intrinsic limitations.
Materials and methods
The workflow of the ‘LRT’ framework is shown in Fig 1, which starts from the integration of scTCR-seq and scRNA-seq data (Fig 1A). Essentially, scTCR-seq and scRNA-seq data are paired via barcode matching, and TCR sequence information is added as metadata of the scRNA-seq data. Using this integrated dataset, LRT applies the Slingshot algorithm [3] to infer the overall cell trajectory (Fig 1B). Next, LRT identifies clusters of clonotypes using a Dirichlet multinomial mixture model (DMM) (Fig 1C). The biasedness along the overall trajectory for clones in each clonotype cluster is evaluated by using permutation tests. Besides, the identified clonotype clusters are characterized via repertoire analysis, top-ranked clonotypes identification, and V-J gene usage pattern analysis (Fig 1D).
Results
LRT identified groups of clonotypes with distinct differentiation trajectory bias in CD8+ T cells
To demonstrate the utility of LRT, we analyzed the scTCR-seq and scRNA-seq data generated from the antigen-specific CD8+ T cells during chronic lymphocytic choriomeningitis virus infection [12] (GEO accession number: GSE188670). We used Seurat (v4.2.0) to load the data and create a Seurat object for the scRNA-seq data and used the scTCR-seq data from the provided raw data (GSE188666_RAW.tar). The authors already preprocessed the scRNA-seq data and also provided cell annotation and UMAP coordinates in the metadata (GSE188666_SCrna_LCMV_metadata.tsv.gz). Hence, we did not implement additional preprocessing. The original data contains cells from both acute and chronic infections collected on Day 8 and 21, which totals more than 70,000 cells.
Discussion
In this paper, we proposed LRT, a novel computational framework for investigating clonal differentiation trajectory heterogeneity by integrating scTCR-seq and scRNA-seq data. LRT addresses the limitation of previous trajectory inference methods that solely rely on scRNA-seq data and neglects the clonal relationship between cells. Specifically, LRT utilizes both the functional information from scRNA-seq data and the clonal information from scTCR-seq data. Such integrative analysis allows researchers to identify clonotype clusters with distinct phenotypic patterns along the differentiation path, which cannot be revealed solely based on scRNA-seq data. With the aforementioned strengths of the proposed LRT framework, we believe that LRT can be a powerful tool for T cell clonal differentiation heterogeneity investigation and integrative analysis of scRNA-seq and scTCR-seq data, and provide a more comprehensive view on the interplay between transcriptional regulation and T cell receptor signaling in shaping the immune response.
Citation: Xie J, Jeon H, Xin G, Ma Q, Chung D (2023) LRT: Integrative analysis of scRNA-seq and scTCR-seq data to investigate clonal differentiation heterogeneity. PLoS Comput Biol 19(7): e1011300. https://doi.org/10.1371/journal.pcbi.1011300
Editor: Mengjie Chen, University of Chicago Pritzker School of Medicine, UNITED STATES
Received: October 12, 2022; Accepted: June 23, 2023; Published: July 10, 2023
Copyright: © 2023 Xie et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The LRT framework was implemented as an R package ‘LRT’, and it is publicly available at https://github.com/JuanXie19/LRT. The Shiny apps ‘shinyClone’ and ‘shinyClust’ are also provided as part of this R package. The data used to produce the results presented in this manuscript are available at GEO with accession number GSE188670 and GSE158896, respectively.
Funding: This work was supported by grants from the National Human Genome Research Institute (R21 HG012482) to D.C., National Institute on Aging (U54 AG075931) to Q.M., National Institute of General Medical Sciences (R01 GM122078) to D.C. and (R01 GM131399) Q.M., National Institute on Drug Abuse (U01 DA045300) to D.C., the National Science Foundation (NSF1945971) to Q.M., and the Pelotonia Institute of Immuno-Oncology (PIIO) to D.C. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011300#abstract0