GAN-enhanced machine learning and metabolic modeling identify reprogramming in pancreatic cancer

Tahereh Razmpour, Masoud Tabibian, Arman Roohi, Rajib Saha

Abstract

Pancreatic ductal adenocarcinoma is one of the deadliest forms of cancer, presenting significant clinical challenges due to poor prognosis and limited treatment options. Understanding the metabolic reprogramming that drives this disease is crucial for identifying new therapeutic targets and improving patient outcomes. We developed a novel computational framework integrating genome-scale metabolic modeling with machine learning to identify metabolic signatures and therapeutic vulnerabilities in pancreatic cancer. 

Introduction

Pancreatic Ductal Adenocarcinoma (PDAC) is a disease with poor prognosis and a highly aggressive form of cancer, largely due to late-stage diagnosis and limited treatment options. The majority of PDAC patients (80–85%) present with locally advanced or metastatic disease at diagnosis, when curative surgical resection is no longer feasible [1]. This delayed detection dramatically impacts survival, as 5-year survival rates decrease from approximately 32% for localized disease to only 3% for metastatic disease [2].

Materials and Methods

Data collection and preprocessing

We obtained gene expression data for pancreatic ductal adenocarcinoma (PDAC) from The Cancer Genome Atlas (TCGA) database. To ensure specificity, we meticulously reviewed annotations and pathology reports for 183 cases, selecting only those explicitly classified as ductal adenocarcinoma. This rigorous process yielded 144 PDAC cases and 4 non-neoplastic pancreatic tissue samples, which served as our control group. 

Results

Generation and validation of synthetic healthy data using GANs

To address the data imbalance between cancerous (n = 144) and healthy (n = 4) samples in our TCGA dataset, we employed a Wasserstein GAN with Gradient Penalty (WGAN-GP) to generate 251 synthetic healthy gene expression profiles. These synthetic profiles underwent rigorous biological validation through genome-scale metabolic modeling and multi-step filtration processes to ensure their biological relevance.

Discussion

Our study presents a novel integrated approach combining genome-scale metabolic modeling with machine learning techniques to investigate metabolic reprogramming in pancreatic ductal adenocarcinoma (PDAC). This comprehensive analysis has revealed several key insights into PDAC metabolism and highlighted potential therapeutic targets.

Acknowledgments

We gratefully acknowledge Dr. Adil Alsiyabi and Andrea Goertzen for their invaluable guidance and support. We also thank the High-Performance Computing Center (HCC) at the University of Nebraska-Lincoln for providing essential computational resources for this work.

Citation: Razmpour T, Tabibian M, Roohi A, Saha R (2026) GAN-enhanced machine learning and metabolic modeling identify reprogramming in pancreatic cancer. PLoS Comput Biol 22(1): e1013862. https://doi.org/10.1371/journal.pcbi.1013862

Editor: Sunil Laxman, Institute for Stem Cell Science and Regenerative Medicine, INDIA

Received: July 30, 2025; Accepted: December 21, 2025; Published: January 2, 2026

Copyright: © 2026 Razmpour et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All the codes and materials used for this study are available at https://github.com/ssbio/PDAC.

Funding: This study was supported by an NIGMS MIRA Award 5R35GM143009 to RS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.