graphenv: a Python library for reinforcement learning on graph search spaces

Many important and challenging problems in combinatorial optimization (CO) can be expressed as graph search problems, in which graph vertices represent full or partial solutions and edges represent decisions that connect them. Graph structure not only introduces strong relational inductive biases fo...

Full description

Saved in:
Bibliographic Details
Published inJournal of open source software Vol. 7; no. 77; p. 4621
Main Authors Biagioni, David, Tripp, Charles Edison, Clark, Struan, Duplyakin, Dmitry, Law, Jeffrey, John, Peter C. St
Format Journal Article
LanguageEnglish
Published United States Open Source Initiative - NumFOCUS 05.09.2022
Subjects
Online AccessGet full text
ISSN2475-9066
2475-9066
DOI10.21105/joss.04621

Cover

Abstract Many important and challenging problems in combinatorial optimization (CO) can be expressed as graph search problems, in which graph vertices represent full or partial solutions and edges represent decisions that connect them. Graph structure not only introduces strong relational inductive biases for learning (Battaglia et al., 2018) - in this context, by providing a way to explicitly model the value of transitioning (along edges) between one search state (vertex) and the next - but lends itself to problems both with and without clearly defined algebraic structure. For example, classic CO problems on graphs such as the Traveling Salesman Problem (TSP) can be expressed as either pure graph search or integer programs. Other problems, however, such as molecular optimization, do no have concise algebraic formulations and yet are readily implemented as a graph search (V. et al., 2022; Zhou et al., 2019). Such "model-free" problems constitute a large fraction of modern reinforcement learning (RL) research owing to the fact that it is often much easier to write a forward simulation that expresses all of the state transitions and rewards, than to write down the precise mathematical expression of the full optimization problem. In the case of molecular optimization, for example, one can use domain knowledge alongside existing software libraries to model the effect of adding a single bond or atom to an existing but incomplete molecule, and let the RL algorithm build a model of how good a given decision is by "experiencing" the simulated environment many times through. In contrast, a model-based mathematical formulation that fully expresses all the chemical and physical constraints is intractable. In recent years, RL has emerged as an effective paradigm for optimizing searches over graphs and led to state-of-the-art heuristics for games like Go and chess, as well as for classical CO problems such as the TSP. This combination of graph search and RL, while powerful, requires non-trivial software to execute, especially when combining advanced state representations such as Graph Neural Networks (GNN) with scalable RL algorithms.
AbstractList Many important and challenging problems in combinatorial optimization (CO) can be expressed as graph search problems, in which graph vertices represent full or partial solutions and edges represent decisions that connect them. Graph structure not only introduces strong relational inductive biases for learning (Battaglia et al., 2018) - in this context, by providing a way to explicitly model the value of transitioning (along edges) between one search state (vertex) and the next - but lends itself to problems both with and without clearly defined algebraic structure. For example, classic CO problems on graphs such as the Traveling Salesman Problem (TSP) can be expressed as either pure graph search or integer programs. Other problems, however, such as molecular optimization, do no have concise algebraic formulations and yet are readily implemented as a graph search (V. et al., 2022; Zhou et al., 2019). Such "model-free" problems constitute a large fraction of modern reinforcement learning (RL) research owing to the fact that it is often much easier to write a forward simulation that expresses all of the state transitions and rewards, than to write down the precise mathematical expression of the full optimization problem. In the case of molecular optimization, for example, one can use domain knowledge alongside existing software libraries to model the effect of adding a single bond or atom to an existing but incomplete molecule, and let the RL algorithm build a model of how good a given decision is by "experiencing" the simulated environment many times through. In contrast, a model-based mathematical formulation that fully expresses all the chemical and physical constraints is intractable. In recent years, RL has emerged as an effective paradigm for optimizing searches over graphs and led to state-of-the-art heuristics for games like Go and chess, as well as for classical CO problems such as the TSP. This combination of graph search and RL, while powerful, requires non-trivial software to execute, especially when combining advanced state representations such as Graph Neural Networks (GNN) with scalable RL algorithms.
Author Duplyakin, Dmitry
Law, Jeffrey
Tripp, Charles Edison
Biagioni, David
Clark, Struan
John, Peter C. St
Author_xml – sequence: 1
  givenname: David
  orcidid: 0000-0001-6140-1957
  surname: Biagioni
  fullname: Biagioni, David
– sequence: 2
  givenname: Charles Edison
  orcidid: 0000-0002-5867-3561
  surname: Tripp
  fullname: Tripp, Charles Edison
– sequence: 3
  givenname: Struan
  orcidid: 0000-0003-0078-6560
  surname: Clark
  fullname: Clark, Struan
– sequence: 4
  givenname: Dmitry
  orcidid: 0000-0001-5132-0168
  surname: Duplyakin
  fullname: Duplyakin, Dmitry
– sequence: 5
  givenname: Jeffrey
  orcidid: 0000-0003-2828-1273
  surname: Law
  fullname: Law, Jeffrey
– sequence: 6
  givenname: Peter C. St
  orcidid: 0000-0002-7928-3722
  surname: John
  fullname: John, Peter C. St
BackLink https://www.osti.gov/servlets/purl/1886880$$D View this record in Osti.gov
BookMark eNp9kE9LAzEUxINUsNae_ALBq25N0k268Sal_oGCHvQcstmXNmWbXZKo9NsbWw8i6GmGx2-Gx5yige88IHROyYRRSvj1potxQkrB6BEasnLGC0mEGPzwJ2gc44YQQivBBKVDtFwF3a_Bv99gjZ93ad153Lo66LDDtgs4gPNZDWzBJ9yCDt75Fc7UPohjvpgsvTYQz9Cx1W2E8beO0Ovd4mX-UCyf7h_nt8vC0JLRAjgta5ClrKa2BqbFbNqQWhorpSWzmlCRDRDWMMsNcGMbazjnDSvBECbtdISuDr1vvte7D922qg9um39WlKj9GOprDLUfI-MXB7yLyaloXAKzNp33YJKiVSWqimSIHiATcjKAVZnTyXU-Be3aP4ovf2X-e-MTQaeAeg
CitedBy_id crossref_primary_10_1021_jacsau_2c00540
Cites_doi 10.48550/arXiv.1606.01540
10.1038/s41598-019-47148-x
10.1016/j.patter.2021.100361
10.1038/s41467-020-16201-z
10.1038/s42256-022-00506-3
10.1039/d1sc02770k
10.1007/978-3-030-50426-7_33
10.48550/arXiv.1806.01261
10.48550/arXiv.2011.06069
10.1038/s41597-020-00588-x
10.48550/arXiv.1712.09381
ContentType Journal Article
CorporateAuthor National Renewable Energy Lab. (NREL), Golden, CO (United States)
CorporateAuthor_xml – name: National Renewable Energy Lab. (NREL), Golden, CO (United States)
DBID AAYXX
CITATION
OIOZB
OTOTI
ADTOC
UNPAY
DOI 10.21105/joss.04621
DatabaseName CrossRef
OSTI.GOV - Hybrid
OSTI.GOV
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2475-9066
ExternalDocumentID 10.21105/joss.04621
1886880
10_21105_joss_04621
GroupedDBID AAFWJ
AAYXX
ADBBV
AFPKN
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
GROUPED_DOAJ
M~E
OK1
OIOZB
OTOTI
ADTOC
UNPAY
ID FETCH-LOGICAL-c1421-e514be94983fbe2a673d0b9cf99f07b01699fe02d2f5ce5cfdfc555d24ec029f3
IEDL.DBID UNPAY
ISSN 2475-9066
IngestDate Wed Oct 29 12:11:26 EDT 2025
Mon Sep 19 16:17:50 EDT 2022
Tue Jul 01 04:04:43 EDT 2025
Thu Apr 24 22:52:56 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Issue 77
Language English
License http://creativecommons.org/licenses/by/4.0
cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1421-e514be94983fbe2a673d0b9cf99f07b01699fe02d2f5ce5cfdfc555d24ec029f3
Notes USDOE Advanced Research Projects Agency - Energy (ARPA-E)
NREL/JA-2700-83459
AC36-08GO28308; AR0001205
ORCID 0000-0002-7928-3722
0000-0003-0078-6560
0000-0001-6140-1957
0000-0001-5132-0168
0000-0003-2828-1273
0000-0002-5867-3561
0000000258673561
0000000328281273
0000000161401957
0000000300786560
0000000279283722
0000000151320168
OpenAccessLink https://proxy.k.utb.cz/login?url=https://joss.theoj.org/papers/10.21105/joss.04621.pdf
ParticipantIDs unpaywall_primary_10_21105_joss_04621
osti_scitechconnect_1886880
crossref_citationtrail_10_21105_joss_04621
crossref_primary_10_21105_joss_04621
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-09-05
PublicationDateYYYYMMDD 2022-09-05
PublicationDate_xml – month: 09
  year: 2022
  text: 2022-09-05
  day: 05
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Journal of open source software
PublicationYear 2022
Publisher Open Source Initiative - NumFOCUS
Publisher_xml – name: Open Source Initiative - NumFOCUS
References Zhou (Zhou_2019) 2019; 9
Prouvost (prouvost2020ecole) 2020
Liang (liang2018rllib) 2018
Brockman (brockman2016openai) 2016
S. V. (Sowndarya_S_V_2021) 2021; 12
St. John (St_John_2020_a) 2020; 7
V. (sv2021multi) 2022
St. John (St_John_2020_b) 2020; 11
Zheng (Zheng_2020) 2020
Battaglia (battaglia2018relational) 2018
Pandey (Pandey_2021) 2021; 2
References_xml – year: 2016
  ident: brockman2016openai
  article-title: Openai gym
  publication-title: arXiv preprint arXiv:1606.01540
  doi: 10.48550/arXiv.1606.01540
– volume: 9
  issue: 1
  year: 2019
  ident: Zhou_2019
  article-title: Optimization of molecules via deep reinforcement learning
  publication-title: Scientific Reports
  doi: 10.1038/s41598-019-47148-x
– volume: 2
  issue: 11
  year: 2021
  ident: Pandey_2021
  article-title: Predicting energy and stability of known and hypothetical crystals using graph neural network
  publication-title: Patterns
  doi: 10.1016/j.patter.2021.100361
– volume: 11
  issue: 1
  year: 2020
  ident: St_John_2020_b
  article-title: Prediction of organic homolytic bond dissociation enthalpies at near chemical accuracy with sub-second computational cost
  publication-title: Nature Communications
  doi: 10.1038/s41467-020-16201-z
– year: 2022
  ident: sv2021multi
  article-title: Multi-objective goal-directed optimization of de novo stable organic radicals for aqueous redox flow batteries
  publication-title: Nature Machine Intelligence
  doi: 10.1038/s42256-022-00506-3
– volume: 12
  issue: 39
  year: 2021
  ident: Sowndarya_S_V_2021
  article-title: A quantitative metric for organic radical stability and persistence using thermodynamic and kinetic features
  publication-title: Chemical Science
  doi: 10.1039/d1sc02770k
– year: 2020
  ident: Zheng_2020
  article-title: OpenGraphGym: A parallel reinforcement learning framework for graph optimization problems
  publication-title: Lecture notes in computer science
  doi: 10.1007/978-3-030-50426-7_33
– year: 2018
  ident: battaglia2018relational
  article-title: Relational inductive biases, deep learning, and graph networks
  publication-title: arXiv preprint arXiv:1806.01261
  doi: 10.48550/arXiv.1806.01261
– year: 2020
  ident: prouvost2020ecole
  article-title: Ecole: A gym-like library for machine learning in combinatorial optimization solvers
  publication-title: Learning meets combinatorial algorithms at NeurIPS2020
  doi: 10.48550/arXiv.2011.06069
– volume: 7
  issue: 1
  year: 2020
  ident: St_John_2020_a
  article-title: Quantum chemical calculations for over 200,000 organic radical species and 40,000 associated closed-shell molecules
  publication-title: Scientific Data
  doi: 10.1038/s41597-020-00588-x
– year: 2018
  ident: liang2018rllib
  article-title: RLlib: Abstractions for distributed reinforcement learning
  publication-title: International conference on machine learning
  doi: 10.48550/arXiv.1712.09381
SSID ssj0001862611
Score 2.1961472
Snippet Many important and challenging problems in combinatorial optimization (CO) can be expressed as graph search problems, in which graph vertices represent full or...
SourceID unpaywall
osti
crossref
SourceType Open Access Repository
Enrichment Source
Index Database
StartPage 4621
SubjectTerms machine learning
MATHEMATICS AND COMPUTING
reinforcement learning
software engineering
Title graphenv: a Python library for reinforcement learning on graph search spaces
URI https://www.osti.gov/servlets/purl/1886880
https://joss.theoj.org/papers/10.21105/joss.04621.pdf
UnpaywallVersion publishedVersion
Volume 7
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2475-9066
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0001862611
  issn: 2475-9066
  databaseCode: DOA
  dateStart: 20160101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2475-9066
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0001862611
  issn: 2475-9066
  databaseCode: M~E
  dateStart: 20160101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED_m9iA-OD9xfow8zBeho02bdvFtiGOIjj04mU-lSRNhjm6MTZl_vZe0GzpE9LmXEC65u1-uud8BNEzRgdRSOoKG0sEIjX5Q-57D0GFGHpNJqE1C_6EXdgfB3ZANS8BWtTAjU1pnavhG9jf-NJkiCrIPljFOsfyzqadEB5PqLaiEDCF4GSqDXr_9bBrJBRFzOMbRvBZvc9S36FOeoBXtwPYimybL92Q8_hJZOlV4Wq0pf1Dy2lzMRVN-bNA1_nvRe7BbYE3Szg_HPpRUdgDVVR8HUpj1Idxb1mqVvV2ThPSXhk2AFNkdgpiWzJSlV5U2k0iKPhMvBKXsQJJbC0HfhE7nCAad28ebrlN0WXCkF1DPUQiZhOIBb_laKJqEkZ-6gkvNuXYjYdhauFYuTalmUjGpUy0ZYykNlHQp1_4xlLNJpk6ApKh4PBSamjkZTXkauklIA6E1F0nUqsHVSvuxLCjITSeMcYxXEaur2OgqtrqqQWMtPM2ZN34WOzPbGCNgMKy30jwPkvPYa7VCdE01uFzv7m-znP5R7hzK89lCXSAQmYu6vcDXi8P3CW1q4bg
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED_m9iA-OD9xfpGH-SJ0tGnTNr4NcQzRsQcn86kkaSLo6MbYlPnXe2m7oUNEn3sJ4ZK7--Wa-x1A0xYdKKOUI2moHIzQ6AeN7zkMHWbkMSVCYxP6972wOwhuh2xYAbashXmxpXW2hu8l_40_ERNEQfmDZYxTrPhs6ynRwaRmA2ohQwhehdqg128_2UZyQcQcjnG0qMVbH_Ut-lTHaEVbsDnPJmLxLkajL5GlU4fH5ZqKByWvrflMttTHGl3jvxe9A9sl1iTt4nDsQkVne1Bf9nEgpVnvw13OWq2ztysiSH9h2QRImd0hiGnJVOf0qirPJJKyz8QzQal8ICmshaBvQqdzAIPOzcN11ym7LDjKC6jnaIRMUvOAx76Rmoow8lNXcmU4N24kLVsLN9qlKTVMaaZMahRjLKWBVi7lxj-EajbO9BGQFBWPh8JQOyejKU9DV4Q0kMZwKaK4AZdL7SeqpCC3nTBGCV5Fcl0lVldJrqsGNFfCk4J542exE7uNCQIGy3qr7PMgNUu8OA7RNTXgYrW7v81y_Ee5U6jOpnN9hkBkJs_LY_cJKJrgww
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=graphenv%3A+a+Python+library+for+reinforcement+learning+on+graph+search+spaces&rft.jtitle=Journal+of+open+source+software&rft.au=Biagioni%2C+David&rft.au=Tripp%2C+Charles+Edison&rft.au=Clark%2C+Struan&rft.au=Duplyakin%2C+Dmitry&rft.date=2022-09-05&rft.pub=Open+Source+Initiative+-+NumFOCUS&rft.issn=2475-9066&rft.eissn=2475-9066&rft.volume=7&rft.issue=77&rft_id=info:doi/10.21105%2Fjoss.04621&rft.externalDocID=1886880
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-9066&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-9066&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-9066&client=summon