TY - GEN
T1 - Enhancing Sequence-Only PPI Site Predictors
T2 - 3rd International Conference on Intelligent Methods, Systems and Applications, IMSA 2025
AU - Elsamadony, Mahmoud
AU - Saif-Eldeen, Mohamed
AU - Elsaadani, Ahmed
AU - Mezar, Abdulrahman
AU - Shehab, Mohamad Amr
AU - Abdelbaky, Ibrahim
AU - Tayara, Hilal
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Protein-protein interactions (PPIs) play essential roles in cellular functions, yet accurate computational identification of PPI bindings ites remains hampered by severe class imbalance and limited feature representations. This study reimplemented the ProB-site framework and evaluated three enhancements independently: autoencoder-based dimensionality reduction, self-attention modules to capture both local and long-range sequence features, and advanced training strategies - including random mixed sampling, class-weighted loss, and L2/kernel normalization - to address data imbalance. On the Test_60 benchmark, the configuration employing class-weighted loss and L2 normalization achieved an F1-score of 0.422, precision of 0.333, Matthew's Correlation Coefficient (MCC) of 0.275, and an AUPRC of 0.468, compared to the original implementation's F1-score of 0.241 and MCC of 0.201. These relative gains of approximately 75% in F1-score and 37% in MCC demonstrate the effectiveness of imbalance-aware training and feature reweighting. The resulting sequence-based model offers a more robust and scalable tool for PPI site prediction, with potential applications in functional annotation and drug design.
AB - Protein-protein interactions (PPIs) play essential roles in cellular functions, yet accurate computational identification of PPI bindings ites remains hampered by severe class imbalance and limited feature representations. This study reimplemented the ProB-site framework and evaluated three enhancements independently: autoencoder-based dimensionality reduction, self-attention modules to capture both local and long-range sequence features, and advanced training strategies - including random mixed sampling, class-weighted loss, and L2/kernel normalization - to address data imbalance. On the Test_60 benchmark, the configuration employing class-weighted loss and L2 normalization achieved an F1-score of 0.422, precision of 0.333, Matthew's Correlation Coefficient (MCC) of 0.275, and an AUPRC of 0.468, compared to the original implementation's F1-score of 0.241 and MCC of 0.201. These relative gains of approximately 75% in F1-score and 37% in MCC demonstrate the effectiveness of imbalance-aware training and feature reweighting. The resulting sequence-based model offers a more robust and scalable tool for PPI site prediction, with potential applications in functional annotation and drug design.
UR - https://www.scopus.com/pages/publications/105018463716
U2 - 10.1109/IMSA65733.2025.11167514
DO - 10.1109/IMSA65733.2025.11167514
M3 - Conference paper
AN - SCOPUS:105018463716
T3 - 3rd International Conference on Intelligent Methods, Systems and Applications, IMSA 2025
SP - 428
EP - 433
BT - 3rd International Conference on Intelligent Methods, Systems and Applications, IMSA 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 12 July 2025 through 13 July 2025
ER -