Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data
Katharina Waury, Stefan Lelieveld, Sanne Abeln, Henk-Jan van den Ham
Abstract
Repertoire sequencing allows us to investigate the antibody-mediated immune response. The clustering of sequences is a crucial step in the data analysis pipeline, aiding in the identification of functionally related antibodies. The conventional clustering approach of clonotyping relies on sequence information, particularly CDRH3 sequence identity and V/J gene usage, to group sequences into clonotypes. It has been suggested that the limitations of sequence-based approaches to identify sequence-dissimilar but functionally converged antibodies can be overcome by using structure information to group antibodies.
Introduction
High-throughput repertoire sequencing has emerged as a fundamental tool to investigate and understand the human immune system [1]. B-cell receptor (BCR) sequencing data, in particular, provides insights into the antibody-mediated adaptive immune response. Investigating these processes is crucial to advance our understanding of modern health challenges including autoimmunity and vaccine development [2, 3].
Materials and method
Antibody pair selection
The Immune Epitope Database (IEDB) [29] (http://www.iedb.org/) collects and describes epitopes in a standardized manner. All entries of discontinuous epitopes with at least one reported positive B-cell assay and an associated 3D structure were downloaded from the IEDB on 15 February 2024. Filtering for epitopes with a known structure in the Protein Data Bank (PDB) [31] (http://www.rcsb.org/) was required as only in these cases relevant information on the binding antibody, e.g., its amino acid sequence, is also available.
Results
Functional convergence is confirmed in antibodies binding to well-studied antigens
To create a set of functionally similar antibody pairs, the Immune Epitope Databases (IEDB) [29] was searched for epitopes located on the same protein antigen. If the overlap of the residues of two epitopes was 75% as defined by their Jaccard index, the respective antibodies were defined as functionally similar and retained. This overlap cutoff allowed the inclusion of a sufficient number of antibody pairs.
Discussion
Incorporating structural information for antibody data analysis has attracted increased attention in recent years. Multiple approaches have been proposed for structure-based clustering of repertoire data to substitute for or augment clonotyping. As novel methods become available, a comparison of their performance with the previous standard technique is essential. Crucially, data used for evaluation should closely resemble the real-world datasets which these methods are intended to be applied to, a goal we aimed to achieve in this study.
Acknowledgments
We would like to acknowledge BAZIS, the Supercomputing cluster of the Vrije Universiteit Amsterdam.
Citation: Waury K, Lelieveld S, Abeln S, van den Ham H-J (2025) Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data. PLoS Comput Biol 21(5): e1013057. https://doi.org/10.1371/journal.pcbi.1013057
Editor: Claude Loverdo, Sorbonne University, FRANCE
Received: June 25, 2024; Accepted: April 17, 2025; Published: May 30, 2025
Copyright: © 2025 Waury et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data and code relevant to this study are available at https://github.com/kathiwaury/clustering-comparison.
Funding: K.W. and S.A. received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement no. 860197, the MIRIADE project (https://miriade.eu/). The funding agency did not play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: S.L. declares that he is an employee of ENPICOM B.V.. H.J.v.d.H. declares that he is an employee of The Hyve B.V. and a former employee of ENPICOM B.V.. S.A. and K.W. state that they have no competing interests.