i6mA-stack: A stacking ensemble-based computational prediction of DNA N6-methyladenine (6mA) sites in the Rosaceae genome

  • Jhabindra Khanal
  • , Dae Young Lim
  • , Hilal Tayara*
  • , Kil To Chong
  • *Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

DNA N6-methyladenine (6 mA) is an epigenetic modification that plays a vital role in a variety of cellular processes in both eukaryotes and prokaryotes. Accurate information of 6 mA sites in the Rosaceae genome may assist in understanding genomic 6 mA distributions and various biological functions such as epigenetic inheritance. Various studies have shown the possibility of identifying 6 mA sites through experiments, but the procedures are time-consuming and costly. To overcome the drawbacks of experimental methods, we propose an accurate computational paradigm based on a machine learning (ML) technique to identify 6 mA sites in Rosa chinensis (R.chinensis) and Fragaria vesca (F.vesca). To improve the performance of the proposed model and to avoid overfitting, a recursive feature elimination with cross-validation (RFECV) strategy is used to extract the optimal number of features (ONF) subset from five different DNA sequence encoding schemes, i.e., Binary Encoding (BE), Ring-Function-Hydrogen-Chemical Properties (RFHC), Electron-Ion-Interaction Pseudo Potentials of Nucleotides (EIIP), Dinucleotide Physicochemical Properties (DPCP), and Trinucleotide Physicochemical Properties (TPCP). Subsequently, we use the ONF subset to train a double layers of ML-based stacking model to create a bioinformatics tool named ‘i6mA-stack’. This tool outperforms its peer tool in general and is currently available at http://nsclbio.jbnu.ac.kr/tools/i6mA-stack/

Original languageEnglish
Pages (from-to)582-592
Number of pages11
JournalGenomics
Volume113
Issue number1P2
DOIs
StatePublished - 2021.01

Keywords

  • DNA N6-methyladenine
  • Machine learning
  • RFECV
  • Sequence analysis
  • Stacking

Quacquarelli Symonds(QS) Subject Topics

  • Biological Sciences

Fingerprint

Dive into the research topics of 'i6mA-stack: A stacking ensemble-based computational prediction of DNA N6-methyladenine (6mA) sites in the Rosaceae genome'. Together they form a unique fingerprint.

Cite this