GAN-ML: Advancing anticancer peptide prediction through innovative Deep Convolution Generative Adversarial Network data augmentation technique

  • Sadik Bhattarai
  • , Kil To Chong*
  • , Hilal Tayara
  • *Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

Limited and imbalanced data hinder anticancer peptide (ACP) prediction, often resulting in over-fitting and poor performance on unseen peptides. To address these challenges, we propose a Deep Convolution Generative Adversarial Network (DC-GAN) based data augmentation method. This approach effectively expands the training dataset by generating peptides with anticancer properties, particularly underrepresented class such as N+ type ACPs, characterized by abundant positive residues in the N-terminus, which remain amnesic problem in anticancer peptide prediction. Compared to traditional methods like Synthetic Minority Over-sampling Technique (SMOTE) and SMOTE with Edited Nearest Neighbors (SMOTEENN), DC-GAN demonstrates superior performance by addressing both limited training samples and within-class imbalances, such as those between C+ and N+ type peptides. The proposed framework, GAN-ML cascade a linear model and an ensemble model, achieving accuracy rates of 82.96% (independent test), 96.06% (independent test), and 94.06% (5-fold cross-validation) for classifying peptides as anticancer, antimicrobial, or non-anticancer across various datasets integrating ACPs motif based authentication and physio-chemical properties based validation. These results highlight the efficacy of DC-GAN-based data augmentation in enhancing model generalization, improving performance by generating a samples with minority representation, and serving as a powerful tool for generative anticancer drug discovery.

Original languageEnglish
Article number105390
JournalChemometrics and Intelligent Laboratory Systems
Volume262
DOIs
StatePublished - 2025.07.15

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Accuracy
  • Anticancer peptides
  • Data augmentation
  • Deep Convolution Generative Adversarial Network
  • Machine learning

Quacquarelli Symonds(QS) Subject Topics

  • Computer Science & Information Systems
  • Engineering - Petroleum
  • Data Science
  • Engineering - Chemical
  • Chemistry

Fingerprint

Dive into the research topics of 'GAN-ML: Advancing anticancer peptide prediction through innovative Deep Convolution Generative Adversarial Network data augmentation technique'. Together they form a unique fingerprint.

Cite this