RPSubAlign: a novel sequence-based molecular representation method for retrosynthesis prediction with improved validity and robustness
Abstract Retrosynthetic route planning is essential for designing efficient pathways to synthesize complex molecules, serving as a cornerstone in drug discovery and organic synthesis. Sequence-based models have become a predominant approach in retrosynthetic route planning, yet its validity and robu...
Saved in:
| Published in | Briefings in bioinformatics Vol. 26; no. 3 |
|---|---|
| Main Authors | , , , , , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
England
Oxford University Press
01.05.2025
Oxford Publishing Limited (England) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1467-5463 1477-4054 1477-4054 |
| DOI | 10.1093/bib/bbaf257 |
Cover
| Summary: | Abstract
Retrosynthetic route planning is essential for designing efficient pathways to synthesize complex molecules, serving as a cornerstone in drug discovery and organic synthesis. Sequence-based models have become a predominant approach in retrosynthetic route planning, yet its validity and robustness remain limited by the challenges in molecular representation methods. Current methods typically treat reactants and products as independent molecules, overlooking structural relationships crucial for accurate synthesis predictions. Herein, we introduce RPSubAlign, a molecular sequence representation method specifically tailored for retrosynthetic tasks, which aligns common substructures between reactants and products to enhance the validity and robustness of sequence-based models. Compared with conventional random and root-alignment representations, RPSubAlign achieves better performance on the USPTO-50K and USPTO-MIT datasets, improving up to a 34.8% increase in Top-N accuracy (with Self-Referencing Embedded Strings representation) and demonstrating enhanced stability across various data augmentation scenarios. RPSubAlign significantly improves syntactic validity, reaching 86.64% on USPTO-50K and 96.45% on USPTO-MIT (with Simplified Molecular Input Line Entry System representation), outperforming baseline methods. These results highlight RPSubAlign as a robust, effective approach for molecular characterization method for retrosynthesis predictions. Code for RPSubAlign is available at https://github.com/Aminoacid1226/RPSubAlign. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Yuting Hu, Feng Hu and Hongwen Zhang contribute equally to this work. |
| ISSN: | 1467-5463 1477-4054 1477-4054 |
| DOI: | 10.1093/bib/bbaf257 |