Abstract
Limited and imbalanced data hinder anticancer peptide (ACP) prediction, often resulting in over-fitting and poor performance on unseen peptides. To address these challenges, we propose a Deep Convolution Generative Adversarial Network (DC-GAN) based data augmentation method. This approach effectively expands the training dataset by generating peptides with anticancer properties, particularly underrepresented class such as N+ type ACPs, characterized by abundant positive residues in the N-terminus, which remain amnesic problem in anticancer peptide prediction. Compared to traditional methods like Synthetic Minority Over-sampling Technique (SMOTE) and SMOTE with Edited Nearest Neighbors (SMOTEENN), DC-GAN demonstrates superior performance by addressing both limited training samples and within-class imbalances, such as those between C+ and N+ type peptides. The proposed framework, GAN-ML cascade a linear model and an ensemble model, achieving accuracy rates of 82.96% (independent test), 96.06% (independent test), and 94.06% (5-fold cross-validation) for classifying peptides as anticancer, antimicrobial, or non-anticancer across various datasets integrating ACPs motif based authentication and physio-chemical properties based validation. These results highlight the efficacy of DC-GAN-based data augmentation in enhancing model generalization, improving performance by generating a samples with minority representation, and serving as a powerful tool for generative anticancer drug discovery.
| Original language | English |
|---|---|
| Article number | 105390 |
| Journal | Chemometrics and Intelligent Laboratory Systems |
| Volume | 262 |
| DOIs | |
| State | Published - 2025.07.15 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Accuracy
- Anticancer peptides
- Data augmentation
- Deep Convolution Generative Adversarial Network
- Machine learning
Quacquarelli Symonds(QS) Subject Topics
- Computer Science & Information Systems
- Engineering - Petroleum
- Data Science
- Engineering - Chemical
- Chemistry
Fingerprint
Dive into the research topics of 'GAN-ML: Advancing anticancer peptide prediction through innovative Deep Convolution Generative Adversarial Network data augmentation technique'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver