Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data

Katharina Waury, Stefan Lelieveld, Sanne Abeln, Henk-Jan van den Ham

Abstract

Repertoire sequencing allows us to investigate the antibody-mediated immune response. The clustering of sequences is a crucial step in the data analysis pipeline, aiding in the identification of functionally related antibodies. The conventional clustering approach of clonotyping relies on sequence information, particularly CDRH3 sequence identity and V/J gene usage, to group sequences into clonotypes. It has been suggested that the limitations of sequence-based approaches to identify sequence-dissimilar but functionally converged antibodies can be overcome by using structure information to group antibodies.

Introduction

High-throughput repertoire sequencing has emerged as a fundamental tool to investigate and understand the human immune system [1]. B-cell receptor (BCR) sequencing data, in particular, provides insights into the antibody-mediated adaptive immune response. Investigating these processes is crucial to advance our understanding of modern health challenges including autoimmunity and vaccine development [2, 3]. 

Materials and method

Antibody pair selection

The Immune Epitope Database (IEDB) [29] (http://www.iedb.org/) collects and describes epitopes in a standardized manner. All entries of discontinuous epitopes with at least one reported positive B-cell assay and an associated 3D structure were downloaded from the IEDB on 15 February 2024. Filtering for epitopes with a known structure in the Protein Data Bank (PDB) [31] (http://www.rcsb.org/) was required as only in these cases relevant information on the binding antibody, e.g., its amino acid sequence, is also available.

Results

Functional convergence is confirmed in antibodies binding to well-studied antigens

To create a set of functionally similar antibody pairs, the Immune Epitope Databases (IEDB) [29] was searched for epitopes located on the same protein antigen. If the overlap of the residues of two epitopes was  75% as defined by their Jaccard index, the respective antibodies were defined as functionally similar and retained. This overlap cutoff allowed the inclusion of a sufficient number of antibody pairs. 

Discussion

Incorporating structural information for antibody data analysis has attracted increased attention in recent years. Multiple approaches have been proposed for structure-based clustering of repertoire data to substitute for or augment clonotyping. As novel methods become available, a comparison of their performance with the previous standard technique is essential. Crucially, data used for evaluation should closely resemble the real-world datasets which these methods are intended to be applied to, a goal we aimed to achieve in this study.

Acknowledgments

We would like to acknowledge BAZIS, the Supercomputing cluster of the Vrije Universiteit Amsterdam.

Citation: Waury K, Lelieveld S, Abeln S, van den Ham H-J (2025) Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data. PLoS Comput Biol 21(5): e1013057. https://doi.org/10.1371/journal.pcbi.1013057

Editor: Claude Loverdo, Sorbonne University, FRANCE

Received: June 25, 2024; Accepted: April 17, 2025; Published: May 30, 2025

Copyright: © 2025 Waury et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data and code relevant to this study are available at https://github.com/kathiwaury/clustering-comparison.

Funding: K.W. and S.A. received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement no. 860197, the MIRIADE project (https://miriade.eu/). The funding agency did not play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: S.L. declares that he is an employee of ENPICOM B.V.. H.J.v.d.H. declares that he is an employee of The Hyve B.V. and a former employee of ENPICOM B.V.. S.A. and K.W. state that they have no competing interests.