Weighted edit distance optimized using genetic algorithm for SMILES-based compound similarity

  • In Hyuk Choi
  • , Il Seok Oh*
  • *Corresponding author for this work

    Research output: Contribution to journalJournal articlepeer-review

    Abstract

    A method for developing new drugs is the ligand-based approach, which requires intermolecular similarity computation. The simplified molecular input line entry system (SMILES) is primarily used to represent the molecular structure in one dimension. It is a representation of molecular structure; the properties can be completely different even if only one character is changed. Applying the conventional edit distance method makes it difficult to obtain optimal results, because the insertion, deletion, and substitution of molecules are considered the same in calculating the distance. This study proposes a novel edit distance using an optimal weight set for three operations. To determine the optimal weight set, we present a genetic algorithm with suitable hyperparameters. To emphasize the impact of the proposed genetic algorithm, we compare it with the exhaustive search algorithm. The experiments performed with four well-known datasets showed that the weighted edit distance optimized with the genetic algorithm resulted in an average performance improvement in approximately 20%.

    Original languageEnglish
    Pages (from-to)1161-1170
    Number of pages10
    JournalPattern Analysis and Applications
    Volume26
    Issue number3
    DOIs
    StatePublished - 2023.08

    Keywords

    • Drug-target interaction prediction
    • Edit distance
    • Genetic algorithm
    • Similarity-based DTI
    • SMILES
    • SMILES similarity
    • Weighted edit distance

    Quacquarelli Symonds(QS) Subject Topics

    • Computer Science & Information Systems
    • Data Science

    Fingerprint

    Dive into the research topics of 'Weighted edit distance optimized using genetic algorithm for SMILES-based compound similarity'. Together they form a unique fingerprint.

    Cite this