graphenv: a Python library for reinforcement learning on graph search spaces

Many important and challenging problems in combinatorial optimization (CO) can be expressed as graph search problems, in which graph vertices represent full or partial solutions and edges represent decisions that connect them. Graph structure not only introduces strong relational inductive biases fo...

Full description

Saved in:

Bibliographic Details
Published in	Journal of open source software Vol. 7; no. 77; p. 4621
Main Authors	Biagioni, David, Tripp, Charles Edison, Clark, Struan, Duplyakin, Dmitry, Law, Jeffrey, John, Peter C. St
Format	Journal Article
Language	English
Published	United States Open Source Initiative - NumFOCUS 05.09.2022
Subjects	machine learning MATHEMATICS AND COMPUTING reinforcement learning software engineering
Online Access	Get full text
ISSN	2475-9066 2475-9066
DOI	10.21105/joss.04621

Cover

Abstract	Many important and challenging problems in combinatorial optimization (CO) can be expressed as graph search problems, in which graph vertices represent full or partial solutions and edges represent decisions that connect them. Graph structure not only introduces strong relational inductive biases for learning (Battaglia et al., 2018) - in this context, by providing a way to explicitly model the value of transitioning (along edges) between one search state (vertex) and the next - but lends itself to problems both with and without clearly defined algebraic structure. For example, classic CO problems on graphs such as the Traveling Salesman Problem (TSP) can be expressed as either pure graph search or integer programs. Other problems, however, such as molecular optimization, do no have concise algebraic formulations and yet are readily implemented as a graph search (V. et al., 2022; Zhou et al., 2019). Such "model-free" problems constitute a large fraction of modern reinforcement learning (RL) research owing to the fact that it is often much easier to write a forward simulation that expresses all of the state transitions and rewards, than to write down the precise mathematical expression of the full optimization problem. In the case of molecular optimization, for example, one can use domain knowledge alongside existing software libraries to model the effect of adding a single bond or atom to an existing but incomplete molecule, and let the RL algorithm build a model of how good a given decision is by "experiencing" the simulated environment many times through. In contrast, a model-based mathematical formulation that fully expresses all the chemical and physical constraints is intractable. In recent years, RL has emerged as an effective paradigm for optimizing searches over graphs and led to state-of-the-art heuristics for games like Go and chess, as well as for classical CO problems such as the TSP. This combination of graph search and RL, while powerful, requires non-trivial software to execute, especially when combining advanced state representations such as Graph Neural Networks (GNN) with scalable RL algorithms.
AbstractList	Many important and challenging problems in combinatorial optimization (CO) can be expressed as graph search problems, in which graph vertices represent full or partial solutions and edges represent decisions that connect them. Graph structure not only introduces strong relational inductive biases for learning (Battaglia et al., 2018) - in this context, by providing a way to explicitly model the value of transitioning (along edges) between one search state (vertex) and the next - but lends itself to problems both with and without clearly defined algebraic structure. For example, classic CO problems on graphs such as the Traveling Salesman Problem (TSP) can be expressed as either pure graph search or integer programs. Other problems, however, such as molecular optimization, do no have concise algebraic formulations and yet are readily implemented as a graph search (V. et al., 2022; Zhou et al., 2019). Such "model-free" problems constitute a large fraction of modern reinforcement learning (RL) research owing to the fact that it is often much easier to write a forward simulation that expresses all of the state transitions and rewards, than to write down the precise mathematical expression of the full optimization problem. In the case of molecular optimization, for example, one can use domain knowledge alongside existing software libraries to model the effect of adding a single bond or atom to an existing but incomplete molecule, and let the RL algorithm build a model of how good a given decision is by "experiencing" the simulated environment many times through. In contrast, a model-based mathematical formulation that fully expresses all the chemical and physical constraints is intractable. In recent years, RL has emerged as an effective paradigm for optimizing searches over graphs and led to state-of-the-art heuristics for games like Go and chess, as well as for classical CO problems such as the TSP. This combination of graph search and RL, while powerful, requires non-trivial software to execute, especially when combining advanced state representations such as Graph Neural Networks (GNN) with scalable RL algorithms.
Author	Duplyakin, Dmitry Law, Jeffrey Tripp, Charles Edison Biagioni, David Clark, Struan John, Peter C. St
Author_xml	– sequence: 1 givenname: David orcidid: 0000-0001-6140-1957 surname: Biagioni fullname: Biagioni, David – sequence: 2 givenname: Charles Edison orcidid: 0000-0002-5867-3561 surname: Tripp fullname: Tripp, Charles Edison – sequence: 3 givenname: Struan orcidid: 0000-0003-0078-6560 surname: Clark fullname: Clark, Struan – sequence: 4 givenname: Dmitry orcidid: 0000-0001-5132-0168 surname: Duplyakin fullname: Duplyakin, Dmitry – sequence: 5 givenname: Jeffrey orcidid: 0000-0003-2828-1273 surname: Law fullname: Law, Jeffrey – sequence: 6 givenname: Peter C. St orcidid: 0000-0002-7928-3722 surname: John fullname: John, Peter C. St
BackLink	https://www.osti.gov/servlets/purl/1886880$$D View this record in Osti.gov
BookMark	eNp9kE9LAzEUxINUsNae_ALBq25N0k268Sal_oGCHvQcstmXNmWbXZKo9NsbWw8i6GmGx2-Gx5yige88IHROyYRRSvj1potxQkrB6BEasnLGC0mEGPzwJ2gc44YQQivBBKVDtFwF3a_Bv99gjZ93ad153Lo66LDDtgs4gPNZDWzBJ9yCDt75Fc7UPohjvpgsvTYQz9Cx1W2E8beO0Ovd4mX-UCyf7h_nt8vC0JLRAjgta5ClrKa2BqbFbNqQWhorpSWzmlCRDRDWMMsNcGMbazjnDSvBECbtdISuDr1vvte7D922qg9um39WlKj9GOprDLUfI-MXB7yLyaloXAKzNp33YJKiVSWqimSIHiATcjKAVZnTyXU-Be3aP4ovf2X-e-MTQaeAeg
CitedBy_id	crossref_primary_10_1021_jacsau_2c00540
Cites_doi	10.48550/arXiv.1606.01540 10.1038/s41598-019-47148-x 10.1016/j.patter.2021.100361 10.1038/s41467-020-16201-z 10.1038/s42256-022-00506-3 10.1039/d1sc02770k 10.1007/978-3-030-50426-7_33 10.48550/arXiv.1806.01261 10.48550/arXiv.2011.06069 10.1038/s41597-020-00588-x 10.48550/arXiv.1712.09381
ContentType	Journal Article
CorporateAuthor	National Renewable Energy Lab. (NREL), Golden, CO (United States)
CorporateAuthor_xml	– name: National Renewable Energy Lab. (NREL), Golden, CO (United States)
DBID	AAYXX CITATION OIOZB OTOTI ADTOC UNPAY
DOI	10.21105/joss.04621
DatabaseName	CrossRef OSTI.GOV - Hybrid OSTI.GOV Unpaywall for CDI: Periodical Content Unpaywall
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	2475-9066
ExternalDocumentID	10.21105/joss.04621 1886880 10_21105_joss_04621
GroupedDBID	AAFWJ AAYXX ADBBV AFPKN ALMA_UNASSIGNED_HOLDINGS BCNDV CITATION GROUPED_DOAJ M~E OK1 OIOZB OTOTI ADTOC UNPAY
ID	FETCH-LOGICAL-c1421-e514be94983fbe2a673d0b9cf99f07b01699fe02d2f5ce5cfdfc555d24ec029f3
IEDL.DBID	UNPAY
ISSN	2475-9066
IngestDate	Wed Oct 29 12:11:26 EDT 2025 Mon Sep 19 16:17:50 EDT 2022 Tue Jul 01 04:04:43 EDT 2025 Thu Apr 24 22:52:56 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	true
Issue	77
Language	English
License	http://creativecommons.org/licenses/by/4.0 cc-by
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c1421-e514be94983fbe2a673d0b9cf99f07b01699fe02d2f5ce5cfdfc555d24ec029f3
Notes	USDOE Advanced Research Projects Agency - Energy (ARPA-E) NREL/JA-2700-83459 AC36-08GO28308; AR0001205
ORCID	0000-0002-7928-3722 0000-0003-0078-6560 0000-0001-6140-1957 0000-0001-5132-0168 0000-0003-2828-1273 0000-0002-5867-3561 0000000258673561 0000000328281273 0000000161401957 0000000300786560 0000000279283722 0000000151320168
OpenAccessLink	https://proxy.k.utb.cz/login?url=https://joss.theoj.org/papers/10.21105/joss.04621.pdf
ParticipantIDs	unpaywall_primary_10_21105_joss_04621 osti_scitechconnect_1886880 crossref_citationtrail_10_21105_joss_04621 crossref_primary_10_21105_joss_04621
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2022-09-05
PublicationDateYYYYMMDD	2022-09-05
PublicationDate_xml	– month: 09 year: 2022 text: 2022-09-05 day: 05
PublicationDecade	2020
PublicationPlace	United States
PublicationPlace_xml	– name: United States
PublicationTitle	Journal of open source software
PublicationYear	2022
Publisher	Open Source Initiative - NumFOCUS
Publisher_xml	– name: Open Source Initiative - NumFOCUS
References	Zhou (Zhou_2019) 2019; 9 Prouvost (prouvost2020ecole) 2020 Liang (liang2018rllib) 2018 Brockman (brockman2016openai) 2016 S. V. (Sowndarya_S_V_2021) 2021; 12 St. John (St_John_2020_a) 2020; 7 V. (sv2021multi) 2022 St. John (St_John_2020_b) 2020; 11 Zheng (Zheng_2020) 2020 Battaglia (battaglia2018relational) 2018 Pandey (Pandey_2021) 2021; 2
References_xml	– year: 2016 ident: brockman2016openai article-title: Openai gym publication-title: arXiv preprint arXiv:1606.01540 doi: 10.48550/arXiv.1606.01540 – volume: 9 issue: 1 year: 2019 ident: Zhou_2019 article-title: Optimization of molecules via deep reinforcement learning publication-title: Scientific Reports doi: 10.1038/s41598-019-47148-x – volume: 2 issue: 11 year: 2021 ident: Pandey_2021 article-title: Predicting energy and stability of known and hypothetical crystals using graph neural network publication-title: Patterns doi: 10.1016/j.patter.2021.100361 – volume: 11 issue: 1 year: 2020 ident: St_John_2020_b article-title: Prediction of organic homolytic bond dissociation enthalpies at near chemical accuracy with sub-second computational cost publication-title: Nature Communications doi: 10.1038/s41467-020-16201-z – year: 2022 ident: sv2021multi article-title: Multi-objective goal-directed optimization of de novo stable organic radicals for aqueous redox flow batteries publication-title: Nature Machine Intelligence doi: 10.1038/s42256-022-00506-3 – volume: 12 issue: 39 year: 2021 ident: Sowndarya_S_V_2021 article-title: A quantitative metric for organic radical stability and persistence using thermodynamic and kinetic features publication-title: Chemical Science doi: 10.1039/d1sc02770k – year: 2020 ident: Zheng_2020 article-title: OpenGraphGym: A parallel reinforcement learning framework for graph optimization problems publication-title: Lecture notes in computer science doi: 10.1007/978-3-030-50426-7_33 – year: 2018 ident: battaglia2018relational article-title: Relational inductive biases, deep learning, and graph networks publication-title: arXiv preprint arXiv:1806.01261 doi: 10.48550/arXiv.1806.01261 – year: 2020 ident: prouvost2020ecole article-title: Ecole: A gym-like library for machine learning in combinatorial optimization solvers publication-title: Learning meets combinatorial algorithms at NeurIPS2020 doi: 10.48550/arXiv.2011.06069 – volume: 7 issue: 1 year: 2020 ident: St_John_2020_a article-title: Quantum chemical calculations for over 200,000 organic radical species and 40,000 associated closed-shell molecules publication-title: Scientific Data doi: 10.1038/s41597-020-00588-x – year: 2018 ident: liang2018rllib article-title: RLlib: Abstractions for distributed reinforcement learning publication-title: International conference on machine learning doi: 10.48550/arXiv.1712.09381
SSID	ssj0001862611
Score	2.1961472
Snippet	Many important and challenging problems in combinatorial optimization (CO) can be expressed as graph search problems, in which graph vertices represent full or...
SourceID	unpaywall osti crossref
SourceType	Open Access Repository Enrichment Source Index Database
StartPage	4621
SubjectTerms	machine learning MATHEMATICS AND COMPUTING reinforcement learning software engineering
Title	graphenv: a Python library for reinforcement learning on graph search spaces
URI	https://www.osti.gov/servlets/purl/1886880 https://joss.theoj.org/papers/10.21105/joss.04621.pdf
UnpaywallVersion	publishedVersion
Volume	7
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2475-9066 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001862611 issn: 2475-9066 databaseCode: DOA dateStart: 20160101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2475-9066 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001862611 issn: 2475-9066 databaseCode: M~E dateStart: 20160101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED_m9iA-OD9xfow8zBeho02bdvFtiGOIjj04mU-lSRNhjm6MTZl_vZe0GzpE9LmXEC65u1-uud8BNEzRgdRSOoKG0sEIjX5Q-57D0GFGHpNJqE1C_6EXdgfB3ZANS8BWtTAjU1pnavhG9jf-NJkiCrIPljFOsfyzqadEB5PqLaiEDCF4GSqDXr_9bBrJBRFzOMbRvBZvc9S36FOeoBXtwPYimybL92Q8_hJZOlV4Wq0pf1Dy2lzMRVN-bNA1_nvRe7BbYE3Szg_HPpRUdgDVVR8HUpj1Idxb1mqVvV2ThPSXhk2AFNkdgpiWzJSlV5U2k0iKPhMvBKXsQJJbC0HfhE7nCAad28ebrlN0WXCkF1DPUQiZhOIBb_laKJqEkZ-6gkvNuXYjYdhauFYuTalmUjGpUy0ZYykNlHQp1_4xlLNJpk6ApKh4PBSamjkZTXkauklIA6E1F0nUqsHVSvuxLCjITSeMcYxXEaur2OgqtrqqQWMtPM2ZN34WOzPbGCNgMKy30jwPkvPYa7VCdE01uFzv7m-znP5R7hzK89lCXSAQmYu6vcDXi8P3CW1q4bg
linkProvider	Unpaywall
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED_m9iA-OD9xfpGH-SJ0tGnTNr4NcQzRsQcn86kkaSLo6MbYlPnXe2m7oUNEn3sJ4ZK7--Wa-x1A0xYdKKOUI2moHIzQ6AeN7zkMHWbkMSVCYxP6972wOwhuh2xYAbashXmxpXW2hu8l_40_ERNEQfmDZYxTrPhs6ynRwaRmA2ohQwhehdqg128_2UZyQcQcjnG0qMVbH_Ut-lTHaEVbsDnPJmLxLkajL5GlU4fH5ZqKByWvrflMttTHGl3jvxe9A9sl1iTt4nDsQkVne1Bf9nEgpVnvw13OWq2ztysiSH9h2QRImd0hiGnJVOf0qirPJJKyz8QzQal8ICmshaBvQqdzAIPOzcN11ym7LDjKC6jnaIRMUvOAx76Rmoow8lNXcmU4N24kLVsLN9qlKTVMaaZMahRjLKWBVi7lxj-EajbO9BGQFBWPh8JQOyejKU9DV4Q0kMZwKaK4AZdL7SeqpCC3nTBGCV5Fcl0lVldJrqsGNFfCk4J542exE7uNCQIGy3qr7PMgNUu8OA7RNTXgYrW7v81y_Ee5U6jOpnN9hkBkJs_LY_cJKJrgww
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=graphenv%3A+a+Python+library+for+reinforcement+learning+on+graph+search+spaces&rft.jtitle=Journal+of+open+source+software&rft.au=Biagioni%2C+David&rft.au=Tripp%2C+Charles+Edison&rft.au=Clark%2C+Struan&rft.au=Duplyakin%2C+Dmitry&rft.date=2022-09-05&rft.pub=Open+Source+Initiative+-+NumFOCUS&rft.issn=2475-9066&rft.eissn=2475-9066&rft.volume=7&rft.issue=77&rft_id=info:doi/10.21105%2Fjoss.04621&rft.externalDocID=1886880
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-9066&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-9066&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-9066&client=summon