Enhancing Sequence-Only PPI Site Predictors: Integrating Attention Modules and Imbalance-Aware Training in a CNN Platform

  • Mahmoud Elsamadony*
  • , Mohamed Saif-Eldeen
  • , Ahmed Elsaadani
  • , Abdulrahman Mezar
  • , Mohamad Amr Shehab
  • , Ibrahim Abdelbaky
  • , Hilal Tayara
  • *Corresponding author for this work

Research output: Contribution to conferenceConference paperpeer-review

Abstract

Protein-protein interactions (PPIs) play essential roles in cellular functions, yet accurate computational identification of PPI bindings ites remains hampered by severe class imbalance and limited feature representations. This study reimplemented the ProB-site framework and evaluated three enhancements independently: autoencoder-based dimensionality reduction, self-attention modules to capture both local and long-range sequence features, and advanced training strategies - including random mixed sampling, class-weighted loss, and L2/kernel normalization - to address data imbalance. On the Test_60 benchmark, the configuration employing class-weighted loss and L2 normalization achieved an F1-score of 0.422, precision of 0.333, Matthew's Correlation Coefficient (MCC) of 0.275, and an AUPRC of 0.468, compared to the original implementation's F1-score of 0.241 and MCC of 0.201. These relative gains of approximately 75% in F1-score and 37% in MCC demonstrate the effectiveness of imbalance-aware training and feature reweighting. The resulting sequence-based model offers a more robust and scalable tool for PPI site prediction, with potential applications in functional annotation and drug design.

Original languageEnglish
Title of host publication3rd International Conference on Intelligent Methods, Systems and Applications, IMSA 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages428-433
Number of pages6
ISBN (Electronic)9798331501860
DOIs
StatePublished - 2025
Event3rd International Conference on Intelligent Methods, Systems and Applications, IMSA 2025 - Giza, Egypt
Duration: 2025.07.122025.07.13

Publication series

Name3rd International Conference on Intelligent Methods, Systems and Applications, IMSA 2025

Conference

Conference3rd International Conference on Intelligent Methods, Systems and Applications, IMSA 2025
Country/TerritoryEgypt
CityGiza
Period25.07.1225.07.13

Quacquarelli Symonds(QS) Subject Topics

  • Computer Science & Information Systems
  • Mathematics
  • Engineering - Petroleum
  • Data Science

Fingerprint

Dive into the research topics of 'Enhancing Sequence-Only PPI Site Predictors: Integrating Attention Modules and Imbalance-Aware Training in a CNN Platform'. Together they form a unique fingerprint.

Cite this