Skip to main navigation Skip to search Skip to main content

DL-SPhos: Prediction of serine phosphorylation sites using transformer language model

  • Palistha Shrestha
  • , Jeevan Kandel
  • , Hilal Tayara*
  • , Kil To Chong
  • *Corresponding author for this work
  • Jeonbuk National University

Research output: Contribution to journalJournal articlepeer-review

Abstract

Serine phosphorylation plays a pivotal role in the pathogenesis of various cellular processes and diseases. Roughly 81% of human diseases have links to phosphorylation, and an overwhelming 86.4% of protein phosphorylation takes place at serine residues. In eukaryotes, over a quarter of proteins undergo phosphorylation, with more than half implicated in numerous disorders, notably cancer and reproductive system diseases. This study primarily focuses on serine-phosphorylation-driven pathogenesis and the critical role of conserved motif identification. While numerous techniques exist for predicting serine phosphorylation sites, traditional wet lab experiments are resource-intensive. Our paper introduces a cutting-edge deep learning tool for predicting S phosphorylation sites, integrating explainable AI for motif identification, a transformer language model, and deep neural network components. We trained our model on protein sequences from UniProt, validated it against the dbPTM benchmark dataset, and employed the PTMD dataset to explore motifs related to mammalian disorders. Our results highlight that our model surpasses other deep learning predictors by a significant 3%. Furthermore, we utilized the local interpretable model-agnostic explanations (LIME) approach to shed light on the predictions, emphasizing the amino acid residues crucial for S phosphorylation. Notably, our model also outperformed competitors in kinase-specific serine phosphorylation prediction on benchmark datasets.

Original languageEnglish
Article number107925
JournalComputers in Biology and Medicine
Volume169
DOIs
StatePublished - 2024.02

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Mammalian diseases
  • Motif identification
  • Protein sequence
  • Serine phosphorylation
  • Transformer language model

Quacquarelli Symonds(QS) Subject Topics

  • Computer Science & Information Systems
  • Medicine
  • Data Science

Fingerprint

Dive into the research topics of 'DL-SPhos: Prediction of serine phosphorylation sites using transformer language model'. Together they form a unique fingerprint.

Cite this