EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit

Functional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal chemistry, they are the basis for analyzing ligand–biomacromolecule interactions. Ertl’s algorithm is an approach to extract functional groups i...

Full description

Saved in:
Bibliographic Details
Published inJournal of chemical information and modeling Vol. 65; no. 3; pp. 1061 - 1066
Main Author Colmenarejo, Gonzalo
Format Journal Article
LanguageEnglish
Published United States American Chemical Society 10.02.2025
Subjects
Online AccessGet full text
ISSN1549-9596
1549-960X
1549-960X
DOI10.1021/acs.jcim.4c02268

Cover

Abstract Functional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal chemistry, they are the basis for analyzing ligand–biomacromolecule interactions. Ertl’s algorithm is an approach to extract functional groups in arbitrary organic molecules that does not depend on predefined libraries of functional groups. However, there is a lack of a complete and accurate implementation of Ertl’s algorithm in the widely used RDKit cheminformatic toolkit. In this paper, a new RDKit/Python implementation of the algorithm is described, that is both accurate and complete. For a RDKit molecule, it provides (i) a PNG binary string with an image of the molecule with color-highlighted functional groups; (ii) a list of sets of atom indices (idx), each set corresponding to a functional group; (iii) a list of pseudo-SMILES canonicalized strings for the full functional groups; and (iv) a list of RDKit labeled mol objects, one for each full functional group. The code is freely available in https://github.com/bbu-imdea/efgs and is part of the RDKit Contrib directory (https://github.com/rdkit/rdkit/tree/master/Contrib/efgs).
AbstractList Functional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal chemistry, they are the basis for analyzing ligand-biomacromolecule interactions. Ertl's algorithm is an approach to extract functional groups in arbitrary organic molecules that does not depend on predefined libraries of functional groups. However, there is a lack of a complete and accurate implementation of Ertl's algorithm in the widely used RDKit cheminformatic toolkit. In this paper, a new RDKit/Python implementation of the algorithm is described, that is both accurate and complete. For a RDKit molecule, it provides (i) a PNG binary string with an image of the molecule with color-highlighted functional groups; (ii) a list of sets of atom indices (idx), each set corresponding to a functional group; (iii) a list of pseudo-SMILES canonicalized strings for the full functional groups; and (iv) a list of RDKit labeled mol objects, one for each full functional group. The code is freely available in https://github.com/bbu-imdea/efgs and is part of the RDKit Contrib directory (https://github.com/rdkit/rdkit/tree/master/Contrib/efgs).Functional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal chemistry, they are the basis for analyzing ligand-biomacromolecule interactions. Ertl's algorithm is an approach to extract functional groups in arbitrary organic molecules that does not depend on predefined libraries of functional groups. However, there is a lack of a complete and accurate implementation of Ertl's algorithm in the widely used RDKit cheminformatic toolkit. In this paper, a new RDKit/Python implementation of the algorithm is described, that is both accurate and complete. For a RDKit molecule, it provides (i) a PNG binary string with an image of the molecule with color-highlighted functional groups; (ii) a list of sets of atom indices (idx), each set corresponding to a functional group; (iii) a list of pseudo-SMILES canonicalized strings for the full functional groups; and (iv) a list of RDKit labeled mol objects, one for each full functional group. The code is freely available in https://github.com/bbu-imdea/efgs and is part of the RDKit Contrib directory (https://github.com/rdkit/rdkit/tree/master/Contrib/efgs).
Functional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal chemistry, they are the basis for analyzing ligand–biomacromolecule interactions. Ertl’s algorithm is an approach to extract functional groups in arbitrary organic molecules that does not depend on predefined libraries of functional groups. However, there is a lack of a complete and accurate implementation of Ertl’s algorithm in the widely used RDKit cheminformatic toolkit. In this paper, a new RDKit/Python implementation of the algorithm is described, that is both accurate and complete. For a RDKit molecule, it provides (i) a PNG binary string with an image of the molecule with color-highlighted functional groups; (ii) a list of sets of atom indices (idx), each set corresponding to a functional group; (iii) a list of pseudo-SMILES canonicalized strings for the full functional groups; and (iv) a list of RDKit labeled mol objects, one for each full functional group. The code is freely available in https://github.com/bbu-imdea/efgs and is part of the RDKit Contrib directory (https://github.com/rdkit/rdkit/tree/master/Contrib/efgs).
Author Colmenarejo, Gonzalo
AuthorAffiliation IMDEA Food
Biostatistics and Bioinformatics Unit
AuthorAffiliation_xml – name: IMDEA Food
– name: Biostatistics and Bioinformatics Unit
Author_xml – sequence: 1
  givenname: Gonzalo
  orcidid: 0000-0002-8249-4547
  surname: Colmenarejo
  fullname: Colmenarejo, Gonzalo
  email: gonzalo.colmenarejo@imdea.org
BackLink https://www.ncbi.nlm.nih.gov/pubmed/39876492$$D View this record in MEDLINE/PubMed
BookMark eNp1kbtOwzAUhi1URC-wMyFLLAyk-Fa7Zot6o6ISEgKJLXIdG1IlcbGTgY3X4PV4EtIbAxLT8fn9_efI_rugVbrSAHCOUR8jgm-UDv2Vzoo-04gQPjwCHTxgMpIcvbQO54HkbdANYYUQpZKTE9Cmcig4k6QD7GQ6C7cwhiNXrHNTGajKFMZa1141zXwjFqasVJW5EjoLJ77Kvz-_ApzWpd6IKocz7-o1HDfurQLj_NX5rHorYFbCx_F9Vp2CY6vyYM72tQeep5On0V20eJjNR_EiUhSLKkoJXmpDpaWEmaXCDCMqsTA25UIii1NJBSPUWiKwFpIYLAViWFA6SLWxmPbA1W7u2rv32oQqKbKgTZ6r0rg6JBRzJBlijDfo5R905WrfPGdLDRHjRKCGuthT9bIwabL2WaH8R3L4wQZAO0B7F4I39hfBKNmElDQhJZuQkn1IjeV6Z9neHJb-i_8AJyCTtg
Cites_doi 10.1021/acs.jcim.4c01875
10.1186/s13321-017-0220-4
10.1021/ci00057a005
10.1021/acs.jcim.3c00050
10.3390/molecules21010001
10.1186/s13321-017-0225-z
10.1021/ci025584y
10.1016/j.gpb.2021.08.014
10.1021/acs.jmedchem.0c00754
10.1021/ci00062a008
10.1021/acs.jnatprod.8b01022
10.1186/s13321-019-0361-8
10.1093/nar/gkad1004
10.1186/s13321-020-00456-1
ContentType Journal Article
Copyright 2025 American Chemical Society
Copyright American Chemical Society Feb 10, 2025
Copyright_xml – notice: 2025 American Chemical Society
– notice: Copyright American Chemical Society Feb 10, 2025
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7SC
7SR
7U5
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
7X8
DOI 10.1021/acs.jcim.4c02268
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Computer and Information Systems Abstracts
Engineered Materials Abstracts
Solid State and Superconductivity Abstracts
METADEX
Technology Research Database
Materials Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Materials Research Database
Engineered Materials Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Solid State and Superconductivity Abstracts
Advanced Technologies Database with Aerospace
METADEX
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic

Materials Research Database
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Chemistry
EISSN 1549-960X
EndPage 1066
ExternalDocumentID 39876492
10_1021_acs_jcim_4c02268
i17731097
Genre Journal Article
GroupedDBID ---
-~X
4.4
55A
5GY
5VS
7~N
AABXI
ABJNI
ABMVS
ABQRX
ABUCX
ACGFS
ACIWK
ACNCT
ACS
ADHLV
AEESW
AENEX
AFEFF
AHGAQ
ALMA_UNASSIGNED_HOLDINGS
AQSVZ
CUPRZ
D0L
DU5
EBS
ED~
F5P
GGK
GNL
IH9
JG~
P2P
PQQKQ
RNS
ROL
UI2
VF5
VG9
W1F
AAYXX
ABBLG
ABLBI
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7SC
7SR
7U5
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
7X8
ID FETCH-LOGICAL-a317t-d21bce39f324eba14103917efd6790f1d937423ff271c792e1970417335dcef13
IEDL.DBID ACS
ISSN 1549-9596
1549-960X
IngestDate Thu Oct 02 06:53:52 EDT 2025
Mon Jun 30 10:47:43 EDT 2025
Fri May 30 11:00:01 EDT 2025
Tue Jul 01 01:58:15 EDT 2025
Tue Feb 11 03:10:35 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
License https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
https://doi.org/10.15223/policy-045
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a317t-d21bce39f324eba14103917efd6790f1d937423ff271c792e1970417335dcef13
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0002-8249-4547
PMID 39876492
PQID 3168046270
PQPubID 28739
PageCount 6
ParticipantIDs proquest_miscellaneous_3160940446
proquest_journals_3168046270
pubmed_primary_39876492
crossref_primary_10_1021_acs_jcim_4c02268
acs_journals_10_1021_acs_jcim_4c02268
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2025-Feb-10
PublicationDateYYYYMMDD 2025-02-10
PublicationDate_xml – month: 02
  year: 2025
  text: 2025-Feb-10
  day: 10
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: Washington
PublicationTitle Journal of chemical information and modeling
PublicationTitleAlternate J. Chem. Inf. Model
PublicationYear 2025
Publisher American Chemical Society
Publisher_xml – name: American Chemical Society
References ref9/cit9
ref17/cit17
ref6/cit6
ref10/cit10
ref3/cit3
ref18/cit18
ref19/cit19
ref11/cit11
ref12/cit12
ref15/cit15
ref16/cit16
ref13/cit13
ref14/cit14
ref8/cit8
ref5/cit5
ref2/cit2
ref4/cit4
Hanson J. R. (ref1/cit1) 2001; 6
ref7/cit7
References_xml – ident: ref9/cit9
  doi: 10.1021/acs.jcim.4c01875
– ident: ref14/cit14
  doi: 10.1186/s13321-017-0220-4
– ident: ref17/cit17
  doi: 10.1021/ci00057a005
– ident: ref2/cit2
  doi: 10.1021/acs.jcim.3c00050
– ident: ref5/cit5
  doi: 10.3390/molecules21010001
– ident: ref4/cit4
– ident: ref6/cit6
  doi: 10.1186/s13321-017-0225-z
– ident: ref13/cit13
  doi: 10.1021/ci025584y
– ident: ref3/cit3
  doi: 10.1016/j.gpb.2021.08.014
– ident: ref10/cit10
– ident: ref16/cit16
– volume: 6
  volume-title: Tutorial Chemistry Texts
  year: 2001
  ident: ref1/cit1
– ident: ref7/cit7
  doi: 10.1021/acs.jmedchem.0c00754
– ident: ref11/cit11
– ident: ref18/cit18
  doi: 10.1021/ci00062a008
– ident: ref8/cit8
  doi: 10.1021/acs.jnatprod.8b01022
– ident: ref12/cit12
  doi: 10.1186/s13321-019-0361-8
– ident: ref15/cit15
  doi: 10.1093/nar/gkad1004
– ident: ref19/cit19
  doi: 10.1186/s13321-020-00456-1
SSID ssj0033962
Score 2.4686053
Snippet Functional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal...
SourceID proquest
pubmed
crossref
acs
SourceType Aggregation Database
Index Database
Publisher
StartPage 1061
SubjectTerms Algorithms
Cheminformatics - methods
Detection Algorithms
Functional groups
Group theory
Organic chemistry
Strings
Title EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit
URI http://dx.doi.org/10.1021/acs.jcim.4c02268
https://www.ncbi.nlm.nih.gov/pubmed/39876492
https://www.proquest.com/docview/3168046270
https://www.proquest.com/docview/3160940446
Volume 65
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVABC
  databaseName: American Chemical Society Journals (Lakeside Campuses)
  customDbUrl:
  eissn: 1549-960X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0033962
  issn: 1549-9596
  databaseCode: ACS
  dateStart: 20050101
  isFulltext: true
  titleUrlDefault: https://pubs.acs.org/action/showPublications?display=journals
  providerName: American Chemical Society
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwELVYDnBhX8omI8GBQ0ptNzHmFpWWCgQHoBK3KHFsKNAUNemFE7_B7_EljJ2kFau4JpadjJd59ozfQ2hPCxpGR650SOgxBzwUcSJFXEcAkFNUeLGWJqJ7cem1O_WzW_d2TJPzNYJPyWEo0-qD7PaqdQn-xjuaRNPU49yk7_mN63LVZUxY8VDDOOYIV5QhyZ9qMI5Ipp8d0S_o0nqZ1nwuV5RackKTXPJYHWZRVb58p278xw8soLkCbGI_Hx2LaEIlS2imUWq8LSPdbJ2mx9jHZlmAHlQ4TGLsSzk0DBLYUgf3ittJCe5r3BxkT--vbylugT_MjxGxPb7CJyqzWV0J9p_u-oNudt_D3QRfnZx3sxXUaTVvGm2nUF5wQsATmRNTEknFhAa4paLQ5IIy2NcpHXtc1DSJAdQADtOaciK5oIoIXqsTzpgbS6UJW0VTST9R6wiLkHMhVGQDijD_BYsViVgMwMrTyiUVtA8GCoqZkwY2KE5JYB-C1YLCahV0UHZX8JwTcfxRdqvsz3HFRqHL3MTltQraHb0Ge5sISZio_tCWMXyCsEWuoLV8HIwaYwIcR13QjX9-8CaapUYr2IjH1LbQVDYYqm0AMFm0Y0fuBxcX6Qs
linkProvider American Chemical Society
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwEB2xHODCvpTVSHDgkFLHTYK5RaWlrAcWiVuUODYU2hQ16YUTv8Hv8SWM3aQIBAiujuU4M47fs8d-A7CtuB1G-46waOgyCxGKWpGkjsWRyEmbu7ESOqJ7fuE2b6ont87tCNDiLgx2IsWWUhPE_1AXoHu67EG0OuWqQNhx90dh3HGrVK-3_NpVMfkyxk0OUS08ZnGHF5HJ71rQeCTSz3j0A8k0YNOYhsthN80Zk8dyP4vK4vmLguO_vmMGpnLqSfzBWJmFEZnMwUStyPg2D6reOEoPiE_0JIH-lCRMYuIL0dd6EsQICXfyu0oJ6SpS72Xtt5fXlDQQHQebisRsZpFDmZkzXgnx23fdXiu775BWQi4PT1vZAtw06te1ppXnYbBCZBeZFds0EpJxheRLRqE-GcpwlSdV7Hq8omiMFAdZmVK2R4XHbUm5V0HXMObEQirKFmEs6SZyGQgPPY9zGZnwIs4GnMWSRixGmuUq6dAS7KCBgvw_SgMTIrdpYArRakFutRLsFl4LngayHL_UXSvc-tGwztel7-V6lRJsDR-jvXW8JExkt2_qaHVBXDCXYGkwHIYvYxxhpMrtlT92eBMmmtfnZ8HZ8cXpKkzaOouwTitTWYOxrNeX60htsmjDDOZ3qBXxbQ
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bb9MwFD4qnQR7GWPj0tExT2IPPKSr4yapeavahrKOCgFDfYsSX7aONq2a9GVP-xv8PX4Jx25SNAQTvDqW7Rxfvs8-9ncAXmvuxknbEw6NfeYgQlEnUdRzOBI55XJfamE8uh9G_uCidTb2xhXwyrcw2IgMS8qsE9_M6oXUhcIAPTXp12Iya7QEQo_ffgBbno8z3TCi7udyAWaM2ziiRnzM4R4vvZN_KsFgksjuYtJfiKYFnPAxfN001d4z-dZY5UlD3Pym4vjf_7ILOwUFJZ31mHkCFZXuwaNuGfltH3Q_fJe9JR1iFgvsV0XiVJKOECujK0GsoPCseLOUkrkm_WU-_XH7PSMhouT6cJHYQy3SU7m965WSzvRyvpzkVzMyScmn3nCSP4WLsP-lO3CKeAxOjCwjd6RLE6EY10jCVBKbG6IMd3tKSz_gTU0lUh1kZ1q7ARUBdxXlQbNFA8Y8KZSm7BlU03mqXgDhcRBwrhLrZsRVgTOpaMIk0i1fK4_W4AQNFBXzKYusq9ylkU1Eq0WF1Wrwpuy5aLGW57gnb73s2l8Fm7hd5n1u0KzB8eYz2tv4TeJUzVc2j1EZxI1zDZ6vh8SmMsYRTlrcPfjHBh_Bw4-9MDp_Pxq-hG3XBBM20WWadajmy5U6RIaTJ6_seP4Jlm7z8A
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=EFGs%3A+A+Complete+and+Accurate+Implementation+of+Ertl%27s+Functional+Group+Detection+Algorithm+in+RDKit&rft.jtitle=Journal+of+chemical+information+and+modeling&rft.au=Colmenarejo%2C+Gonzalo&rft.date=2025-02-10&rft.eissn=1549-960X&rft.volume=65&rft.issue=3&rft.spage=1061&rft_id=info:doi/10.1021%2Facs.jcim.4c02268&rft_id=info%3Apmid%2F39876492&rft.externalDocID=39876492
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1549-9596&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1549-9596&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1549-9596&client=summon