TCR2HLA: Calibrated inference of HLA genotypes from TCR repertoires enables identification of immunologically relevant metaclonotypes
Koshlan Mayer-Blackwell, Anastasia Minervina, Mikhail Pogorelyy, Puneet Rawat, Melanie R. Shapiro, Leeana D. Peters, Emily S. Ford, Amanda L. Posgai, Kasi Vegesana, Samuel Minot, David M. Koelle, Victor Greiff, Philip Bradley, Todd M. Brusko, Paul G. Thomas, Andrew Fiore-Gartland
Abstract
T cell receptors (TCRs) recognize peptides presented by polymorphic human leukocyte antigen (HLA) molecules, but HLA genotype data are often missing from TCR repertoire sequencing studies. To address this, we developed TCR2HLA, an open-source tool that infers HLA genotypes from TCRβ repertoires. Expanding on work linking public TRBV-CDR3 sequences to HLA genotypes, we incorporated “quasi-public” metaclonotypes – composed of rarer TCRβ sequences with shared amino acid features – enriched by HLA genotypes.
Introduction
Highly diverse αβ T cell receptors (TCRs) survey peptides presented by human leukocyte antigen (HLA) molecules. Although TCR repertoires are routinely sequenced from blood and tissues, identifying immunologically relevant features within diverse adaptive immune receptor repertoire (AIRR) data remains a challenging problem [1,2].
Methods
To develop predictive models of HLA genotypes from peripheral blood TCRβ repertoires, we assembled a training dataset of 3,125 repertoires from four studies (S1 Table) and constructed a bioinformatic pipeline to identify exact and near-exact (edit-distance 1) TCR features strongly associated with HLA alleles (Fig 1A-1C and S2 Table).
Results
TCR2HLA identifies exact and near-exact HLA-associated TCR features across cohorts
TCR2HLA is a statistical tool for inferring HLA genotypes from the occurrence patterns of receptors in peripheral blood TCRβ repertoires. Leveraging a training set of 3,125 repertoires from four independent cohorts with HLA-labeled data [18–21], we identified exact and near-exact TRBV-CDR3β features associated with common HLA alleles (Fig 1A-C).
Discussion
In this study, we developed open-source software for the large-scale discovery and selection of HLA-associated TCR features. Our work builds on prior efforts to infer HLA genotypes from unlabeled TCR repertoire data. For example, Mayer-Blackwell et al. (2021) employed public HLA-associated TCRβ sequences previously identified by DeWitt et al. (2018) to predict expression of common HLA alleles in unlabeled repertoires.
Acknowledgments
We thank Dr. Daniel Geraghty for providing high-resolution HLA-genotyping participants in the Elyanow et al. study, Ying Lei for helpful discussion about a memory-efficient algorithm for finding mutational variants, and Dr. Liel Cohen-Lavi for helpful early discussion on HLA feature selection.
Citation: Mayer-Blackwell K, Minervina A, Pogorelyy M, Rawat P, Shapiro MR, Peters LD, et al. (2026) TCR2HLA: Calibrated inference of HLA genotypes from TCR repertoires enables identification of immunologically relevant metaclonotypes. PLoS Comput Biol 22(1): e1013767.
https://doi.org/10.1371/journal.pcbi.1013767
Editor: Brittany Rife Magalis, University of Louisville, UNITED STATES OF AMERICA
Received: August 8, 2025; Accepted: November 19, 2025; Published: January 16, 2026
Copyright: © 2026 Mayer-Blackwell et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Data availability: TCR repertoires data used to identify HLA-associated features were analyzed from previous studies and are available (https://clients.adaptivebiotech.com/pub/emerson-2017-natgen, https://clients.adaptivebiotech.com/pub/elyanow-2022-jci, https://clients.adaptivebiotech.com/pub/musvosvi-2022-nm). TCR repertoire data from Rawat et al. study (doi: 10.1101/2024.12.10.24318751) is deposited at the AIRR Data Commons using the iReceptor Gateway using Study ID IR-T1D-000004. HLA genotype labels are available from supporting information in primary manuscripts [doi: 10.7554/eLife.38358, doi: 10.1101/2024.12.10.24318751 and doi: 10.21417/mm2022nm]. HLA genotype data for participants in the Rawat et al. study are deposited in dbGaP under accession phs003979.v1.p1. HLA genotype data for participants in the Elyanow et al. study (doi: 10.1172/jci.insight.150070) are deposited in dbGaP under accession phs004406.v1.p1.
Code Availability: Code used for feature discovery, model fitting, and calibration is available at github.com/kmayerb/TCR2HLA. A command-line python tool for running TCR2HLA with multiple cpus is available at github.com/kmayerb/TCR2HLA. An interactive serverless web-browser application is also provided: kmayerb.github.io/TCR2HLAi/. Tools for CPU or GPU-accelerated approximate TCRdist computation and network-based meta-clonotype motif discovery are available at github.com/kmayerb/tcrdistgpu.
Funding: KMB, AFG were supported by NIH grant number R01 AI136514 and by a grant from the Gates Foundation (INV-027499). The conclusions and opinions expressed in this work are those of the authors alone and shall not be attributed to the Foundation. Scientific Computing Infrastructure at Fred Hutchinson Cancer Research Center was funded by an ORIP grant S10OD028685. Specimen collection included samples from persons enrolled in NCT04338360 or NCT04344977 funded by the US NIH, with data collection from these samples funded by the US NIH NIAID Contract 75N93019C00063 (DMK). Type 1 Diabetes datasets were supported by grants from The Leona M. and Harry B. Helmsley Charitable Trust (#2019PG-T1D011, to TMB and VG), the National Institutes of Health (P01 AI042288, to TMB; K99 DK140511, to MRS), and the American Diabetes Association (11-23-PDF-78, to LDP). VG was supported by a Norwegian Cancer Society Grant (#215817), Research Council of Norway grants (300740, 311341, 331890), and an ERC-CoG (101125630). PR was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 801133. AM, MP, and PGT were supported by NIH grants AI136514, AI144616, and AI165077, and American Lebanese Syrian Associated Charities (ALSAC) at St. Jude. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: The authors declare the following competing interests: V.G. declares advisory board positions in aiNET GmbH, Enpicom B.V, Absci, Fairjourney Biologics and Diagonal Therapeutics. V.G. is a consultant for Adaptyv Biosystems, Proteinea, and LabGenius. V.G. is an employee of Imprint LLC. PGT is on the Scientific Advisory Board of Immunoscape and Shennon Bio, has received research support and personal fees from Elevate Bio, and consulted for 10X Genomics, Illumina, Pfizer, Cytoagents, Sanofi, Merck, and JNJ. PGT, AAM, and MVP have patents related to TCR amplification, cloning, and/or applications thereof.
