iProm-Zea: A two-layer model to identify plant promoters and their types using convolutional neural network

  • Jeehong Kim
  • , Muhammad Shujaat
  • , Hilal Tayara*
  • *Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

A promoter is a short DNA sequence near the start codon, responsible for initiating the transcription of a specific gene in the genome. The accurate recognition of promoters is important for achieving a better understanding of transcriptional regulation. Because of their importance in the process of biological transcriptional regulation, there is an urgent need to develop in silico tools to identify promoters and their types in a timely and accurate manner. A number of prediction methods have been developed in this regard; however, almost all of them are merely used for identifying promoters and their strength or sigma types. The TATA box region in TATA promoter influences the post-transcriptional processes; therefore, in the current study, we developed a two-layer predictor called “iProm-Zea” using the convolutional neural network (CNN) for identify TATA and TATA less promoters. The first layer can be used to identify a given DNA sequence as a promoter or non-promoter. The second layer can be used to identify whether the recognized promoter is the TATA promoter. To find an optimal feature encoding scheme and model, we employed four feature encoding schemes on different machine learning and CNN algorithms, and based on the evaluation results, we selected a one-hot encoding scheme and a CNN model for iProm-Zea. The 5-fold cross validation testing results demonstrated that the constructed predictor showed great potential for identifying promoters and classifying them as TATA and TATA less promoters. Furthermore, we performed cross-species analysis of iProm-Zea to evaluate its performance in other species. Moreover, to make it easier for other experimental scientists to obtain the results they need, we established a freely accessible and user-friendly web server at http://nsclbio.jbnu.ac.kr/tools/iProm-Zea/.

Original languageEnglish
Article number110384
JournalGenomics
Volume114
Issue number3
DOIs
StatePublished - 2022.05

Keywords

  • Bioinformatics
  • Computational biology
  • Convolution neural network
  • Plant genome
  • Promoter

Quacquarelli Symonds(QS) Subject Topics

  • Biological Sciences

Fingerprint

Dive into the research topics of 'iProm-Zea: A two-layer model to identify plant promoters and their types using convolutional neural network'. Together they form a unique fingerprint.

Cite this