EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit
Functional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal chemistry, they are the basis for analyzing ligand–biomacromolecule interactions. Ertl’s algorithm is an approach to extract functional groups i...
Saved in:
| Published in | Journal of chemical information and modeling Vol. 65; no. 3; pp. 1061 - 1066 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | English |
| Published |
United States
American Chemical Society
10.02.2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1549-9596 1549-960X 1549-960X |
| DOI | 10.1021/acs.jcim.4c02268 |
Cover
| Abstract | Functional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal chemistry, they are the basis for analyzing ligand–biomacromolecule interactions. Ertl’s algorithm is an approach to extract functional groups in arbitrary organic molecules that does not depend on predefined libraries of functional groups. However, there is a lack of a complete and accurate implementation of Ertl’s algorithm in the widely used RDKit cheminformatic toolkit. In this paper, a new RDKit/Python implementation of the algorithm is described, that is both accurate and complete. For a RDKit molecule, it provides (i) a PNG binary string with an image of the molecule with color-highlighted functional groups; (ii) a list of sets of atom indices (idx), each set corresponding to a functional group; (iii) a list of pseudo-SMILES canonicalized strings for the full functional groups; and (iv) a list of RDKit labeled mol objects, one for each full functional group. The code is freely available in https://github.com/bbu-imdea/efgs and is part of the RDKit Contrib directory (https://github.com/rdkit/rdkit/tree/master/Contrib/efgs). |
|---|---|
| AbstractList | Functional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal chemistry, they are the basis for analyzing ligand-biomacromolecule interactions. Ertl's algorithm is an approach to extract functional groups in arbitrary organic molecules that does not depend on predefined libraries of functional groups. However, there is a lack of a complete and accurate implementation of Ertl's algorithm in the widely used RDKit cheminformatic toolkit. In this paper, a new RDKit/Python implementation of the algorithm is described, that is both accurate and complete. For a RDKit molecule, it provides (i) a PNG binary string with an image of the molecule with color-highlighted functional groups; (ii) a list of sets of atom indices (idx), each set corresponding to a functional group; (iii) a list of pseudo-SMILES canonicalized strings for the full functional groups; and (iv) a list of RDKit labeled mol objects, one for each full functional group. The code is freely available in https://github.com/bbu-imdea/efgs and is part of the RDKit Contrib directory (https://github.com/rdkit/rdkit/tree/master/Contrib/efgs).Functional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal chemistry, they are the basis for analyzing ligand-biomacromolecule interactions. Ertl's algorithm is an approach to extract functional groups in arbitrary organic molecules that does not depend on predefined libraries of functional groups. However, there is a lack of a complete and accurate implementation of Ertl's algorithm in the widely used RDKit cheminformatic toolkit. In this paper, a new RDKit/Python implementation of the algorithm is described, that is both accurate and complete. For a RDKit molecule, it provides (i) a PNG binary string with an image of the molecule with color-highlighted functional groups; (ii) a list of sets of atom indices (idx), each set corresponding to a functional group; (iii) a list of pseudo-SMILES canonicalized strings for the full functional groups; and (iv) a list of RDKit labeled mol objects, one for each full functional group. The code is freely available in https://github.com/bbu-imdea/efgs and is part of the RDKit Contrib directory (https://github.com/rdkit/rdkit/tree/master/Contrib/efgs). Functional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal chemistry, they are the basis for analyzing ligand–biomacromolecule interactions. Ertl’s algorithm is an approach to extract functional groups in arbitrary organic molecules that does not depend on predefined libraries of functional groups. However, there is a lack of a complete and accurate implementation of Ertl’s algorithm in the widely used RDKit cheminformatic toolkit. In this paper, a new RDKit/Python implementation of the algorithm is described, that is both accurate and complete. For a RDKit molecule, it provides (i) a PNG binary string with an image of the molecule with color-highlighted functional groups; (ii) a list of sets of atom indices (idx), each set corresponding to a functional group; (iii) a list of pseudo-SMILES canonicalized strings for the full functional groups; and (iv) a list of RDKit labeled mol objects, one for each full functional group. The code is freely available in https://github.com/bbu-imdea/efgs and is part of the RDKit Contrib directory (https://github.com/rdkit/rdkit/tree/master/Contrib/efgs). |
| Author | Colmenarejo, Gonzalo |
| AuthorAffiliation | IMDEA Food Biostatistics and Bioinformatics Unit |
| AuthorAffiliation_xml | – name: IMDEA Food – name: Biostatistics and Bioinformatics Unit |
| Author_xml | – sequence: 1 givenname: Gonzalo orcidid: 0000-0002-8249-4547 surname: Colmenarejo fullname: Colmenarejo, Gonzalo email: gonzalo.colmenarejo@imdea.org |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/39876492$$D View this record in MEDLINE/PubMed |
| BookMark | eNp1kbtOwzAUhi1URC-wMyFLLAyk-Fa7Zot6o6ISEgKJLXIdG1IlcbGTgY3X4PV4EtIbAxLT8fn9_efI_rugVbrSAHCOUR8jgm-UDv2Vzoo-04gQPjwCHTxgMpIcvbQO54HkbdANYYUQpZKTE9Cmcig4k6QD7GQ6C7cwhiNXrHNTGajKFMZa1141zXwjFqasVJW5EjoLJ77Kvz-_ApzWpd6IKocz7-o1HDfurQLj_NX5rHorYFbCx_F9Vp2CY6vyYM72tQeep5On0V20eJjNR_EiUhSLKkoJXmpDpaWEmaXCDCMqsTA25UIii1NJBSPUWiKwFpIYLAViWFA6SLWxmPbA1W7u2rv32oQqKbKgTZ6r0rg6JBRzJBlijDfo5R905WrfPGdLDRHjRKCGuthT9bIwabL2WaH8R3L4wQZAO0B7F4I39hfBKNmElDQhJZuQkn1IjeV6Z9neHJb-i_8AJyCTtg |
| Cites_doi | 10.1021/acs.jcim.4c01875 10.1186/s13321-017-0220-4 10.1021/ci00057a005 10.1021/acs.jcim.3c00050 10.3390/molecules21010001 10.1186/s13321-017-0225-z 10.1021/ci025584y 10.1016/j.gpb.2021.08.014 10.1021/acs.jmedchem.0c00754 10.1021/ci00062a008 10.1021/acs.jnatprod.8b01022 10.1186/s13321-019-0361-8 10.1093/nar/gkad1004 10.1186/s13321-020-00456-1 |
| ContentType | Journal Article |
| Copyright | 2025 American Chemical Society Copyright American Chemical Society Feb 10, 2025 |
| Copyright_xml | – notice: 2025 American Chemical Society – notice: Copyright American Chemical Society Feb 10, 2025 |
| DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7SC 7SR 7U5 8BQ 8FD JG9 JQ2 L7M L~C L~D 7X8 |
| DOI | 10.1021/acs.jcim.4c02268 |
| DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Computer and Information Systems Abstracts Engineered Materials Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Materials Research Database Engineered Materials Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest Computer Science Collection Computer and Information Systems Abstracts Solid State and Superconductivity Abstracts Advanced Technologies Database with Aerospace METADEX Computer and Information Systems Abstracts Professional MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic Materials Research Database MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Chemistry |
| EISSN | 1549-960X |
| EndPage | 1066 |
| ExternalDocumentID | 39876492 10_1021_acs_jcim_4c02268 i17731097 |
| Genre | Journal Article |
| GroupedDBID | --- -~X 4.4 55A 5GY 5VS 7~N AABXI ABJNI ABMVS ABQRX ABUCX ACGFS ACIWK ACNCT ACS ADHLV AEESW AENEX AFEFF AHGAQ ALMA_UNASSIGNED_HOLDINGS AQSVZ CUPRZ D0L DU5 EBS ED~ F5P GGK GNL IH9 JG~ P2P PQQKQ RNS ROL UI2 VF5 VG9 W1F AAYXX ABBLG ABLBI CITATION CGR CUY CVF ECM EIF NPM 7SC 7SR 7U5 8BQ 8FD JG9 JQ2 L7M L~C L~D 7X8 |
| ID | FETCH-LOGICAL-a317t-d21bce39f324eba14103917efd6790f1d937423ff271c792e1970417335dcef13 |
| IEDL.DBID | ACS |
| ISSN | 1549-9596 1549-960X |
| IngestDate | Thu Oct 02 06:53:52 EDT 2025 Mon Jun 30 10:47:43 EDT 2025 Fri May 30 11:00:01 EDT 2025 Tue Jul 01 01:58:15 EDT 2025 Tue Feb 11 03:10:35 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 3 |
| Language | English |
| License | https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 https://doi.org/10.15223/policy-045 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a317t-d21bce39f324eba14103917efd6790f1d937423ff271c792e1970417335dcef13 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ORCID | 0000-0002-8249-4547 |
| PMID | 39876492 |
| PQID | 3168046270 |
| PQPubID | 28739 |
| PageCount | 6 |
| ParticipantIDs | proquest_miscellaneous_3160940446 proquest_journals_3168046270 pubmed_primary_39876492 crossref_primary_10_1021_acs_jcim_4c02268 acs_journals_10_1021_acs_jcim_4c02268 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2025-Feb-10 |
| PublicationDateYYYYMMDD | 2025-02-10 |
| PublicationDate_xml | – month: 02 year: 2025 text: 2025-Feb-10 day: 10 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States – name: Washington |
| PublicationTitle | Journal of chemical information and modeling |
| PublicationTitleAlternate | J. Chem. Inf. Model |
| PublicationYear | 2025 |
| Publisher | American Chemical Society |
| Publisher_xml | – name: American Chemical Society |
| References | ref9/cit9 ref17/cit17 ref6/cit6 ref10/cit10 ref3/cit3 ref18/cit18 ref19/cit19 ref11/cit11 ref12/cit12 ref15/cit15 ref16/cit16 ref13/cit13 ref14/cit14 ref8/cit8 ref5/cit5 ref2/cit2 ref4/cit4 Hanson J. R. (ref1/cit1) 2001; 6 ref7/cit7 |
| References_xml | – ident: ref9/cit9 doi: 10.1021/acs.jcim.4c01875 – ident: ref14/cit14 doi: 10.1186/s13321-017-0220-4 – ident: ref17/cit17 doi: 10.1021/ci00057a005 – ident: ref2/cit2 doi: 10.1021/acs.jcim.3c00050 – ident: ref5/cit5 doi: 10.3390/molecules21010001 – ident: ref4/cit4 – ident: ref6/cit6 doi: 10.1186/s13321-017-0225-z – ident: ref13/cit13 doi: 10.1021/ci025584y – ident: ref3/cit3 doi: 10.1016/j.gpb.2021.08.014 – ident: ref10/cit10 – ident: ref16/cit16 – volume: 6 volume-title: Tutorial Chemistry Texts year: 2001 ident: ref1/cit1 – ident: ref7/cit7 doi: 10.1021/acs.jmedchem.0c00754 – ident: ref11/cit11 – ident: ref18/cit18 doi: 10.1021/ci00062a008 – ident: ref8/cit8 doi: 10.1021/acs.jnatprod.8b01022 – ident: ref12/cit12 doi: 10.1186/s13321-019-0361-8 – ident: ref15/cit15 doi: 10.1093/nar/gkad1004 – ident: ref19/cit19 doi: 10.1186/s13321-020-00456-1 |
| SSID | ssj0033962 |
| Score | 2.4686053 |
| Snippet | Functional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal... |
| SourceID | proquest pubmed crossref acs |
| SourceType | Aggregation Database Index Database Publisher |
| StartPage | 1061 |
| SubjectTerms | Algorithms Cheminformatics - methods Detection Algorithms Functional groups Group theory Organic chemistry Strings |
| Title | EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit |
| URI | http://dx.doi.org/10.1021/acs.jcim.4c02268 https://www.ncbi.nlm.nih.gov/pubmed/39876492 https://www.proquest.com/docview/3168046270 https://www.proquest.com/docview/3160940446 |
| Volume | 65 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVABC databaseName: American Chemical Society Journals (Lakeside Campuses) customDbUrl: eissn: 1549-960X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0033962 issn: 1549-9596 databaseCode: ACS dateStart: 20050101 isFulltext: true titleUrlDefault: https://pubs.acs.org/action/showPublications?display=journals providerName: American Chemical Society |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwELVYDnBhX8omI8GBQ0ptNzHmFpWWCgQHoBK3KHFsKNAUNemFE7_B7_EljJ2kFau4JpadjJd59ozfQ2hPCxpGR650SOgxBzwUcSJFXEcAkFNUeLGWJqJ7cem1O_WzW_d2TJPzNYJPyWEo0-qD7PaqdQn-xjuaRNPU49yk7_mN63LVZUxY8VDDOOYIV5QhyZ9qMI5Ipp8d0S_o0nqZ1nwuV5RackKTXPJYHWZRVb58p278xw8soLkCbGI_Hx2LaEIlS2imUWq8LSPdbJ2mx9jHZlmAHlQ4TGLsSzk0DBLYUgf3ittJCe5r3BxkT--vbylugT_MjxGxPb7CJyqzWV0J9p_u-oNudt_D3QRfnZx3sxXUaTVvGm2nUF5wQsATmRNTEknFhAa4paLQ5IIy2NcpHXtc1DSJAdQADtOaciK5oIoIXqsTzpgbS6UJW0VTST9R6wiLkHMhVGQDijD_BYsViVgMwMrTyiUVtA8GCoqZkwY2KE5JYB-C1YLCahV0UHZX8JwTcfxRdqvsz3HFRqHL3MTltQraHb0Ge5sISZio_tCWMXyCsEWuoLV8HIwaYwIcR13QjX9-8CaapUYr2IjH1LbQVDYYqm0AMFm0Y0fuBxcX6Qs |
| linkProvider | American Chemical Society |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwEB2xHODCvpTVSHDgkFLHTYK5RaWlrAcWiVuUODYU2hQ16YUTv8Hv8SWM3aQIBAiujuU4M47fs8d-A7CtuB1G-46waOgyCxGKWpGkjsWRyEmbu7ESOqJ7fuE2b6ont87tCNDiLgx2IsWWUhPE_1AXoHu67EG0OuWqQNhx90dh3HGrVK-3_NpVMfkyxk0OUS08ZnGHF5HJ71rQeCTSz3j0A8k0YNOYhsthN80Zk8dyP4vK4vmLguO_vmMGpnLqSfzBWJmFEZnMwUStyPg2D6reOEoPiE_0JIH-lCRMYuIL0dd6EsQICXfyu0oJ6SpS72Xtt5fXlDQQHQebisRsZpFDmZkzXgnx23fdXiu775BWQi4PT1vZAtw06te1ppXnYbBCZBeZFds0EpJxheRLRqE-GcpwlSdV7Hq8omiMFAdZmVK2R4XHbUm5V0HXMObEQirKFmEs6SZyGQgPPY9zGZnwIs4GnMWSRixGmuUq6dAS7KCBgvw_SgMTIrdpYArRakFutRLsFl4LngayHL_UXSvc-tGwztel7-V6lRJsDR-jvXW8JExkt2_qaHVBXDCXYGkwHIYvYxxhpMrtlT92eBMmmtfnZ8HZ8cXpKkzaOouwTitTWYOxrNeX60htsmjDDOZ3qBXxbQ |
| linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bb9MwFD4qnQR7GWPj0tExT2IPPKSr4yapeavahrKOCgFDfYsSX7aONq2a9GVP-xv8PX4Jx25SNAQTvDqW7Rxfvs8-9ncAXmvuxknbEw6NfeYgQlEnUdRzOBI55XJfamE8uh9G_uCidTb2xhXwyrcw2IgMS8qsE9_M6oXUhcIAPTXp12Iya7QEQo_ffgBbno8z3TCi7udyAWaM2ziiRnzM4R4vvZN_KsFgksjuYtJfiKYFnPAxfN001d4z-dZY5UlD3Pym4vjf_7ILOwUFJZ31mHkCFZXuwaNuGfltH3Q_fJe9JR1iFgvsV0XiVJKOECujK0GsoPCseLOUkrkm_WU-_XH7PSMhouT6cJHYQy3SU7m965WSzvRyvpzkVzMyScmn3nCSP4WLsP-lO3CKeAxOjCwjd6RLE6EY10jCVBKbG6IMd3tKSz_gTU0lUh1kZ1q7ARUBdxXlQbNFA8Y8KZSm7BlU03mqXgDhcRBwrhLrZsRVgTOpaMIk0i1fK4_W4AQNFBXzKYusq9ylkU1Eq0WF1Wrwpuy5aLGW57gnb73s2l8Fm7hd5n1u0KzB8eYz2tv4TeJUzVc2j1EZxI1zDZ6vh8SmMsYRTlrcPfjHBh_Bw4-9MDp_Pxq-hG3XBBM20WWadajmy5U6RIaTJ6_seP4Jlm7z8A |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=EFGs%3A+A+Complete+and+Accurate+Implementation+of+Ertl%27s+Functional+Group+Detection+Algorithm+in+RDKit&rft.jtitle=Journal+of+chemical+information+and+modeling&rft.au=Colmenarejo%2C+Gonzalo&rft.date=2025-02-10&rft.eissn=1549-960X&rft.volume=65&rft.issue=3&rft.spage=1061&rft_id=info:doi/10.1021%2Facs.jcim.4c02268&rft_id=info%3Apmid%2F39876492&rft.externalDocID=39876492 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1549-9596&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1549-9596&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1549-9596&client=summon |