Abstract
A method for developing new drugs is the ligand-based approach, which requires intermolecular similarity computation. The simplified molecular input line entry system (SMILES) is primarily used to represent the molecular structure in one dimension. It is a representation of molecular structure; the properties can be completely different even if only one character is changed. Applying the conventional edit distance method makes it difficult to obtain optimal results, because the insertion, deletion, and substitution of molecules are considered the same in calculating the distance. This study proposes a novel edit distance using an optimal weight set for three operations. To determine the optimal weight set, we present a genetic algorithm with suitable hyperparameters. To emphasize the impact of the proposed genetic algorithm, we compare it with the exhaustive search algorithm. The experiments performed with four well-known datasets showed that the weighted edit distance optimized with the genetic algorithm resulted in an average performance improvement in approximately 20%.
| Original language | English |
|---|---|
| Pages (from-to) | 1161-1170 |
| Number of pages | 10 |
| Journal | Pattern Analysis and Applications |
| Volume | 26 |
| Issue number | 3 |
| DOIs | |
| State | Published - 2023.08 |
Keywords
- Drug-target interaction prediction
- Edit distance
- Genetic algorithm
- Similarity-based DTI
- SMILES
- SMILES similarity
- Weighted edit distance
Quacquarelli Symonds(QS) Subject Topics
- Computer Science & Information Systems
- Data Science
Fingerprint
Dive into the research topics of 'Weighted edit distance optimized using genetic algorithm for SMILES-based compound similarity'. Together they form a unique fingerprint.Press/Media
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver