Memorization bias impacts modeling of alternative conformational states of solute carrier membrane proteins with methods from deep learning
G.V.T. Swapna, Namita Dube, Monica J. Roth, Gaetano T. Montelione
Abstract
The Solute Carrier (SLC) superfamily of integral membrane proteins transport a wide array of small molecules across plasma and organelle membranes, and function as important drug transporters and as viral receptors. They populate different conformational states during the solute transport process, including outward-open, intermediate (occluded), and inward-open conformational states.
Introduction
Proteins adopt multiple conformational states which are essential to their functions. While AlphaFold2/3 (AF2/3) [1], Evolutionary Scale Modeling (ESM) [2], and related machine-learning methods [3,4] can provide accurate structural models of proteins, for systems that adopt multiple conformational states, conventional AF2/3 and ESM calculations often deliver only one of the multiple states observed experimentally [5–13]. Recent advances have been reported using modified AF2 protocols and “enhanced sampling” methods to model multiple conformational states of proteins, including integral membrane proteins [14].
Methods
Evolutionary covariance (EC) - based contact predictions
EC-based contact predictions were performed using evolutionary covariance analysis with NeBcon (Neural-network and Bayes-classifier based contact prediction)
https://seq2fun.dcmb.med.umich.edu/NeBcon/, a hierarchical algorithm for sequence-based protein contact map prediction [53], with a probability threshold of 0.7. A second server, EVcouplings server [37] https://evcouplings.org/ was also used to confirm these contact predictions.
Results
The challenge we encountered arises from the fact that conventional AF modeling protocols generally provided only one of the multiple conformations of SLC proteins, particularly when only one of these states was available as an experimental structure at the time of training. Even enhanced sampling methods successfully generate alternative conformational states for only some multistate proteins [6,8–13].
Discussion
We were very surprised to observe significant weaknesses of various published protocols using AF2 or AF3 for modeling alternative conformations of pseudo-symmetric SLC transporters. However, where conventional AF2/3 modeling (or even AF2 modeling with enhanced sampling) provides only one (either inward- or outward-open) conformational state; the alternative state can be modeled by the template-based ESM-AF (or ESM-MODELLER) protocol.
Acknowledgments
We thank Dr. Alberto Perez and Jokent Gaza for running inference using several structural templates with their locally-installed instance of AlphaFold3, and Dr. Davide Sala for providing scripts for running AF-alt. We also thank T.B. Acton, T. Benavides, A. De Falco, A. Gaur, R. Greene-Cramer, Y.J. Huang, T.A. Ramelot, B. Shurina, L. Spaman, and R. Tejero for helpful discussions and comments on the manuscript, and S. Collen for computer system administration support.
Citation: Swapna G, Dube N, Roth MJ, Montelione GT (2025) Memorization bias impacts modeling of alternative conformational states of solute carrier membrane proteins with methods from deep learning. PLoS Comput Biol 21(10): e1013590. https://doi.org/10.1371/journal.pcbi.1013590
Editor: Alex Peralvarez-Marin, Universitat Autònoma de Barcelona: Universitat Autonoma de Barcelona, SPAIN
Received: May 1, 2025; Accepted: October 6, 2025; Published: October 17, 2025
Copyright: © 2025 Swapna et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All scripts and key data generated in this study are available at https://doi.org/10.5281/zenodo.17386602
Funding: FINANCIAL DISCLOSURE This work was supported financially by National Institutes of Health NIGMS (https://www.nigms.nih.gov/) grants R35 GM141818 (to G.T.M.) and R35 GM122518 (to M.J.R.), and by the Rensselaer Polytechnic Institute (RPI) Bio-computing and Bio-informatics Constellation Chair Fund (to G.T.M). GTM also acknowledges access to the RPI Center for Computational Innovations (CCI) computing infrastructure. Research costs and partial salaries of G.V.T.S. were supported by NIH grants R35 GM141818 and R35 GM122518. Research costs and partial salaries of N.D. were supported by NIH grant R35 GM141818. Research costs and partial salaries of M.J.R. were support by NIH grant R35 GM122518. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: GTM is a founder of Nexomics Biosciences, Inc.
Abbreviations: AF2, AlphaFold2 Multimer; AF3, AlphaFold3; EC, evolutionary covariance; ESM, Evolutionary-Scale Modeling; LDDT, local-distance difference test; MD, molecular dynamics; ML, machine learning; mmCIF, macromolecular Crystallographic Information File; MSA, multiple sequence alignment; PDB, Protein Data Bank; pLDDT, predicted Local-Distance Difference Test, a confidence score predicted from ML; TM, Template Modeling score to assess similarity between two protein structures