Bayesian optimization and machine learning for vaccine formulation development
Lillian Li , Sung-In Back, Jian Ma, Yawen Guo, Thomas Galeandro-Diamant, Didier Clénet
Abstract
Developing vaccines with a better stability is an area of improvement to meet the global health needs of preventing infectious diseases. With the advancement of data science and artificial intelligence, innovative approaches have emerged. This manuscript highlights the applications of machine learning through two cases in which Bayesian optimization was used to develop viral vaccine formulations. The two case studies monitored the critical quality attributes of virus A in liquid form by infectious titer loss and virus B in freeze-dried form by glass transition temperature.
Introduction
Vaccine is an important type of pharmaceutical product that provides a simple and economic way of preventing infectious diseases, while saving millions of lives from pandemics worldwide [1]. However, developing a new vaccine raises a host of challenges, from designing and stabilizing a new antigen up to distributing the vaccine doses to target populations. After a vaccine product is shipped out of the manufacturing site, the challenge still lies in ensuring its critical quality attributes (CQAs), stability, and effectiveness throughout its shelf-life.
Materials and method
Vaccine candidates
The potential of BO-based modeling for formulation development was investigated with two types of vaccine candidates. The excipients screened for both case studies are commonly-used compounds such as amino acids, antioxidants and chelating agents, sugars and polyols, mono- and bi-valent salts, polymers and proteins, surfactants, and buffer agents [11].
Results
Model optimization virus A formulation case study 1
ML model generation and optimization.
For the case study 1, the ML approach consisted of 3 main stages: model training, model optimization, and model validation. Table 1 provides a comprehensive overview of key features utilized in the two first stages, presenting model training and optimization process in 5 steps.
Discussion
Model generation and optimization for case 1
As illustrated in Fig 6, the general strategy to obtain a high accuracy model uses BO to iteratively suggest new experiments in the areas of higher uncertainty. The results of these new experiment are then integrated into the dataset, progressively improving the model accuracy.
Acknowledgments
Authors would like to acknowledge Sanofi colleagues Chase Orsello; Nausheen Rahman; Rajarshi Roychoudhury; Anthony Sheung; Joël Morand for their support on this project; and Jean-Sébastien Bolduc and Hassan Khan for excellent editorial assistance.
Citation: Li L, Back S-I, Ma J, Guo Y, Galeandro-Diamant T, Clénet D (2025) Bayesian optimization and machine learning for vaccine formulation development. PLoS One 20(6): e0324205. https://doi.org/10.1371/journal.pone.0324205
Editor: Satish Rojekar,, Icahn School of Medicine at Mount Sinai Department of Pharmacological Sciences, UNITED STATES OF AMERICA
Received: December 5, 2024; Accepted: April 21, 2025; Published: June 11, 2025
Copyright: © 2025 Li et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: “**PA at Accept: Please follow up with AU to confirm minimal data set has been provided** All relevant data are within the manuscript and its Supporting Information files that included case studies data used for Bayesian optimization model generation and validation”.
Funding: The author(s) received no specific funding for this work.
Competing interests: This project is funded by Sanofi. All of the authors from Sanofi may hold shares and/or stock options in the company. TGD is affiliated with ChemAI Ltd. This does not alter our adherence to PLOS ONE policies on sharing data and materials. There are no patents, products in development or marketed products associated with this research to declare.
