TFProtBert: Detection of Transcription Factors Binding to Methylated DNA Using ProtBert Latent Space Representation

  • Saima Gaffar
  • , Kil To Chong*
  • , Hilal Tayara*
  • *Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

Transcription factors (TFs) are fundamental regulators of gene expression and perform diverse functions in cellular processes. The management of 3-dimensional (3D) genome conformation and gene expression relies primarily on TFs. TFs are crucial regulators of gene expression, performing various roles in biological processes. They attract transcriptional machinery to the enhancers or promoters of specific genes, thereby activating or inhibiting transcription. Identifying these TFs is a significant step towards understanding cellular gene expression mechanisms. Due to the time-consuming and labor-intensive nature of experimental methods, the development of computational models is essential. In this work, we introduced a two-layer prediction framework based on a support vector machine (SVM) using the latent space representation of a protein language model, ProtBert. The first layer of the method reliably predicts and identifies transcription factors (TFs), and in the second layer, the proposed method predicts and identifies transcription factors that prefer binding to methylated deoxyribonucleic acid (TFPMs). In addition, we also tested the proposed method on an imbalanced database. In detecting TFs and TFPMs, the proposed model consistently outperformed state-of-the-art approaches, as demonstrated by performance comparisons via empirical cross-validation analysis and independent tests.

Original languageEnglish
Article number4234
JournalInternational Journal of Molecular Sciences
Volume26
Issue number9
DOIs
StatePublished - 2025.05

Keywords

  • bidirectional encoder representations from transformers
  • machine learning
  • methylated deoxyribonucleic acid
  • non-methylated deoxyribonucleic acid
  • protein language model
  • transcription factors

Quacquarelli Symonds(QS) Subject Topics

  • Computer Science & Information Systems
  • Engineering - Petroleum
  • Data Science
  • Engineering - Chemical
  • Chemistry
  • Biological Sciences

Fingerprint

Dive into the research topics of 'TFProtBert: Detection of Transcription Factors Binding to Methylated DNA Using ProtBert Latent Space Representation'. Together they form a unique fingerprint.

Cite this