Toward Generalization of Automated Temporal Abstraction to Partially Observable Reinforcement Learning
Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Variou...
Saved in:
| Published in | IEEE transactions on cybernetics Vol. 45; no. 8; pp. 1414 - 1425 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
United States
IEEE
01.08.2015
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2168-2267 2168-2275 2168-2275 |
| DOI | 10.1109/TCYB.2014.2352038 |
Cover
| Abstract | Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Various studies have explored the subject under the assumption that the problem domain is fully observable by the learning agent. Learning abstractions for partially observable RL is a relatively less explored area. In this paper, we adapt an existing automatic abstraction method, namely extended sequence tree, originally designed for fully observable problems. The modified method covers a certain family of model-based partially observable RL settings. We also introduce belief state discretization methods that can be used with this new abstraction mechanism. The effectiveness of the proposed abstraction method is shown empirically by experimenting on well-known benchmark problems. |
|---|---|
| AbstractList | Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Various studies have explored the subject under the assumption that the problem domain is fully observable by the learning agent. Learning abstractions for partially observable RL is a relatively less explored area. In this paper, we adapt an existing automatic abstraction method, namely extended sequence tree, originally designed for fully observable problems. The modified method covers a certain family of model-based partially observable RL settings. We also introduce belief state discretization methods that can be used with this new abstraction mechanism. The effectiveness of the proposed abstraction method is shown empirically by experimenting on well-known benchmark problems. Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Various studies have explored the subject under the assumption that the problem domain is fully observable by the learning agent. Learning abstractions for partially observable RL is a relatively less explored area. In this paper, we adapt an existing automatic abstraction method, namely extended sequence tree, originally designed for fully observable problems. The modified method covers a certain family of model-based partially observable RL settings. We also introduce belief state discretization methods that can be used with this new abstraction mechanism. The effectiveness of the proposed abstraction method is shown empirically by experimenting on well-known benchmark problems.Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Various studies have explored the subject under the assumption that the problem domain is fully observable by the learning agent. Learning abstractions for partially observable RL is a relatively less explored area. In this paper, we adapt an existing automatic abstraction method, namely extended sequence tree, originally designed for fully observable problems. The modified method covers a certain family of model-based partially observable RL settings. We also introduce belief state discretization methods that can be used with this new abstraction mechanism. The effectiveness of the proposed abstraction method is shown empirically by experimenting on well-known benchmark problems. |
| Author | Polat, Faruk Cilden, Erkin |
| Author_xml | – sequence: 1 givenname: Erkin surname: Cilden fullname: Cilden, Erkin email: ecilden@ceng.metu.edu.tr organization: Department of Computer Engineering, Middle East Technical University, Ankara, Turkey – sequence: 2 givenname: Faruk surname: Polat fullname: Polat, Faruk email: polat@ceng.metu.edu.tr organization: Department of Computer Engineering, Middle East Technical University, Ankara, Turkey |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/25216494$$D View this record in MEDLINE/PubMed |
| BookMark | eNp9UcFqGzEUFCWhTlN_QCkUHXOxI2m12t2ja5o0YEgI7qGn5a32bVHZlVxJ2-B8feTY8SGB6KKHZuaNmPlETqyzSMgXzuacs-pyvfz9fS4Yl3OR5YJl5QdyJrgqZ0IU-clxVsWETEP4y9Ip01NVfiQTkSdUVvKMdGv3AL6l12jRQ28eIRpnqevoYoxugIgtXeOwcQmkiyZED_qZER29Ax8N9P2W3jYB_X9oeqT3aGznvMYBbaQrBG-N_fOZnHbQB5we7nPy6-rHevlztrq9vlkuVjOdMxlnTcZLXrFWSK3yEnSOqAupeaE6mTDG0oggWgUghSqVynKdwshQqTbRWXZOxH7vaDewfUifqzfeDOC3NWf1Lrc66m1T73KrD7kl0cVetPHu34gh1oMJGvseLLox1FxVhRBMlTvqtwN1bAZsj8tfAk0Evido70Lw2L3x3_X22r94pdEmPveQ0jb9u8qve6VBxKNTKlnmRZE9ATTpo8I |
| CODEN | ITCEB8 |
| CitedBy_id | crossref_primary_10_1109_TCYB_2021_3079149 crossref_primary_10_1007_s13042_022_01713_5 crossref_primary_10_1016_j_robot_2017_09_001 crossref_primary_10_1016_j_neucom_2024_128797 crossref_primary_10_1016_j_ins_2022_07_052 crossref_primary_10_1109_TCYB_2021_3102510 crossref_primary_10_1109_TCYB_2021_3107202 |
| Cites_doi | 10.1007/3-540-45622-8_16 10.1023/A:1025696116075 10.1109/3477.846230 10.1109/TSMCB.2007.899419 10.1016/S0004-3702(98)00023-X 10.1109/SICE.2007.4421430 10.1016/S0004-3702(99)00052-1 10.1007/BF00115009 10.1613/jair.301 10.1109/3477.499796 10.1007/s10994-010-5182-y 10.1287/opre.39.1.162 10.1613/jair.639 10.1109/IROS.1996.571080 |
| ContentType | Journal Article |
| DBID | 97E RIA RIE AAYXX CITATION NPM 7X8 ADTOC UNPAY |
| DOI | 10.1109/TCYB.2014.2352038 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef PubMed MEDLINE - Academic Unpaywall for CDI: Periodical Content Unpaywall |
| DatabaseTitle | CrossRef PubMed MEDLINE - Academic |
| DatabaseTitleList | PubMed MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher – sequence: 3 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Sciences (General) |
| EISSN | 2168-2275 |
| EndPage | 1425 |
| ExternalDocumentID | oai:https://open.metu.edu.tr:11511/46018 25216494 10_1109_TCYB_2014_2352038 6894577 |
| Genre | orig-research Research Support, Non-U.S. Gov't Journal Article |
| GrantInformation_xml | – fundername: Türkiye Bilimsel ve Teknolojik Araştirma Kurumu; Scientific and Technological Research Council of Turkey grantid: 113E239 funderid: 10.13039/501100004410 |
| GroupedDBID | 0R~ 4.4 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACIWK AENEX AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD HZ~ IFIPE IPLJI JAVBF M43 O9- OCL PQQKQ RIA RIE RNS AAYXX CITATION NPM RIG 7X8 ADTOC UNPAY |
| ID | FETCH-LOGICAL-c504t-b318190d24c658ac5eec74c176f4b3100c17ea2d6aa42686635c1103e66d8ac03 |
| IEDL.DBID | RIE |
| ISSN | 2168-2267 2168-2275 |
| IngestDate | Sun Oct 26 04:07:21 EDT 2025 Sun Sep 28 09:22:07 EDT 2025 Thu Apr 03 07:04:03 EDT 2025 Wed Oct 01 05:14:31 EDT 2025 Thu Apr 24 23:10:57 EDT 2025 Wed Aug 27 08:36:58 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 8 |
| Keywords | partially observable Markov decision process (POMDP) reinforcement learning (RL) Learning abstractions |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html cc-by-nc-nd |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c504t-b318190d24c658ac5eec74c176f4b3100c17ea2d6aa42686635c1103e66d8ac03 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://hdl.handle.net/11511/46018 |
| PMID | 25216494 |
| PQID | 1697220688 |
| PQPubID | 23479 |
| PageCount | 12 |
| ParticipantIDs | proquest_miscellaneous_1697220688 crossref_primary_10_1109_TCYB_2014_2352038 crossref_citationtrail_10_1109_TCYB_2014_2352038 ieee_primary_6894577 pubmed_primary_25216494 unpaywall_primary_10_1109_tcyb_2014_2352038 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2015-08-01 |
| PublicationDateYYYYMMDD | 2015-08-01 |
| PublicationDate_xml | – month: 08 year: 2015 text: 2015-08-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | IEEE transactions on cybernetics |
| PublicationTitleAbbrev | TCYB |
| PublicationTitleAlternate | IEEE Trans Cybern |
| PublicationYear | 2015 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| References | ref13 sutton (ref1) 1998 ref15 ref30 bradtke (ref17) 1994; 7 (ref31) 2012; 2 ref10 mcgovern (ref3) 2001 parr (ref37) 1995; 2 pineau (ref33) 2004 cassandra (ref34) 1998 ref19 menache (ref5) 2002 theocharous (ref12) 2004; 16 chrisman (ref21) 1992 hengst (ref2) 2002 watkins (ref16) 1989 mccallum (ref22) 1996 ref23 ref20 smith (ref35) 2004 kaelbling (ref11) 1996; 4 lin (ref24) 1993 bellman (ref14) 1957 im?ek (ref6) 2004 dietterich (ref36) 2000; 13 ref27 roy (ref32) 2003 ref29 ref8 parr (ref18) 1998 ref9 ref4 mcgovern (ref7) 1998 zhou (ref28) 2001; 1 littman (ref25) 1998 charlin (ref26) 2007; 19 |
| References_xml | – start-page: 520 year: 2004 ident: ref35 article-title: Heuristic search value iteration for POMDPs publication-title: Proc 20th Conf Uncertainty Artif Intell – year: 1989 ident: ref16 article-title: Learning from delayed rewards – ident: ref4 doi: 10.1007/3-540-45622-8_16 – start-page: 95 year: 2004 ident: ref6 article-title: Using relative novelty to identify useful temporal abstractions in reinforcement learning publication-title: Proc 21st Int Conf Mach Learn – ident: ref20 doi: 10.1023/A:1025696116075 – ident: ref27 doi: 10.1109/3477.846230 – year: 1996 ident: ref22 article-title: Reinforcement learning with selective perception and hidden state – ident: ref8 doi: 10.1109/TSMCB.2007.899419 – volume: 16 start-page: 775 year: 2004 ident: ref12 article-title: Approximate planning in POMDPs with macro-actions publication-title: Proc Adv Neural Inf Process Syst – volume: 2 start-page: 1088 year: 1995 ident: ref37 article-title: Approximating optimal policies for partially observable stochastic domains publication-title: Proc 14th Int Joint Conf Artif Intell – ident: ref10 doi: 10.1016/S0004-3702(98)00023-X – volume: 19 start-page: 225 year: 2007 ident: ref26 article-title: Automated hierarchy discovery for planning in partially observable environments publication-title: Proc Adv Neural Inf Process Syst – ident: ref13 doi: 10.1109/SICE.2007.4421430 – ident: ref19 doi: 10.1016/S0004-3702(99)00052-1 – volume: 7 start-page: 393 year: 1994 ident: ref17 article-title: Reinforcement learning methods for continuous-time Markov decision problems publication-title: Proc Adv Neural Inf Process Syst – year: 1998 ident: ref18 article-title: Hierarchical control and learning for Markov decision processes – ident: ref15 doi: 10.1007/BF00115009 – year: 2003 ident: ref32 article-title: Finding approximate POMDP solutions through belief compression – volume: 4 start-page: 237 year: 1996 ident: ref11 article-title: Reinforcement learning: A survey publication-title: J Artif Intell Res doi: 10.1613/jair.301 – year: 1998 ident: ref7 article-title: acQuire-macros: An algorithm for automatically learning macro-actions publication-title: Proc Neural Inf Process Syst Conf Workshop Abstraction Hierarchy Reinforcement Learn – ident: ref23 doi: 10.1109/3477.499796 – year: 1993 ident: ref24 article-title: Reinforcement learning for robots using neural networks – volume: 1 start-page: 707 year: 2001 ident: ref28 article-title: An improved grid-based approximation algorithm for POMDPs publication-title: Proc 17th Int Joint Conf Artif Intell – volume: 2 start-page: 348 year: 2012 ident: ref31 article-title: Abstraction in model based partially observable reinforcement learning using extended sequence trees publication-title: Proc IEEE/WIC/ACM Int Conf Web Intell Intell Agent Technol – ident: ref9 doi: 10.1007/s10994-010-5182-y – start-page: 495 year: 1998 ident: ref25 article-title: Learning policies for partially observable environments: Scaling up publication-title: Readings in Agents – year: 1998 ident: ref1 publication-title: Reinforcement Learning An Introduction – ident: ref29 doi: 10.1287/opre.39.1.162 – start-page: 295 year: 2002 ident: ref5 article-title: Q-cut-Dynamic discovery of sub-goals in reinforcement learning publication-title: Proc 13th Eur Conf Mach Learn – volume: 13 start-page: 227 year: 2000 ident: ref36 article-title: Hierarchical reinforcement learning with the MAXQ value function decomposition publication-title: J Artif Intell Res doi: 10.1613/jair.639 – start-page: 361 year: 2001 ident: ref3 article-title: Automatic discovery of subgoals in reinforcement learning using diverse density publication-title: Proc 18th Int Conf Mach Learn – ident: ref30 doi: 10.1109/IROS.1996.571080 – start-page: 183 year: 1992 ident: ref21 article-title: Reinforcement learning with perceptual aliasing: The perceptual distinctions approach publication-title: Proc Nat Conf Artif Intell – year: 2004 ident: ref33 article-title: Tractable planning under uncertainty: Exploiting structure – year: 1998 ident: ref34 article-title: Exact and approximate algorithms for partially observable Markov decision processes – year: 1957 ident: ref14 publication-title: Dynamic Programming – start-page: 243 year: 2002 ident: ref2 article-title: Discovering hierarchy in reinforcement learning with HEXQ publication-title: Proc 19th Int Conf Mach Learn |
| SSID | ssj0000816898 |
| Score | 2.1247952 |
| Snippet | Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task.... |
| SourceID | unpaywall proquest pubmed crossref ieee |
| SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 1414 |
| SubjectTerms | Approximation algorithms Approximation methods Entropy History Learning (artificial intelligence) Learning abstractions Mathematical model partially observable Markov decision process (POMDP) reinforcement learning (RL) Vectors |
| SummonAdditionalLinks | – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB7R5UB7KKX0EaDIrTi0RdlNvI69Pm5XRYgDRWhXglNkO04PXSWoJKq2v56x413xEhI3JxpHdmZG_mzPfANwUCLEVzLN0NOYjBktbSyF4vFwaExZupQE5dk-T_nxjJ1cZBdr8GWZC3OPXwDxSpoOGO4aRi9gnWeIt3uwPjs9G1-6qnEpRzVTXyY2tEUWri7TRA4as9Aueov1KeKMxOWg3Fp8fDWVx4DlK9hoqyu1-Kfm81uLzdEmTJbD7GJM_vTbRvfN_3sMjk_P4w28DliTjDvj2II1W72FreDN1-RroJz-tg3l1EfPkvAm5GaSuiTjtqkR1NqCTDsSK_yedscjPh-CNDU5c8aHc1iQX9of8eq5JefWM7Iaf_hIAonr73cwO_o5nRzHoQJDbLKENbFGj0fEUFBmEKkok1lrBDOp4CXT7moAm1bRgiuFK_3IoReDf35oOS9QPBm-h15VV_YjEO2QpJCqYAljVCTa0frg9lNZRDyq4BEkS9XkJtCTuyoZ89xvUxKZTyeXP3KnzTxoM4Lvqy5XHTfHU8LbTt8rQT6SLBMigs9L_efoWO62RFW2bq_zlEtBqavJE8GHzjBWnSmCHs4ki-BwZSkPhuCs784Qdp4lvQsv8THrgg33oNf8be0nBECN3g8ecAOs0_vY priority: 102 providerName: Unpaywall |
| Title | Toward Generalization of Automated Temporal Abstraction to Partially Observable Reinforcement Learning |
| URI | https://ieeexplore.ieee.org/document/6894577 https://www.ncbi.nlm.nih.gov/pubmed/25216494 https://www.proquest.com/docview/1697220688 https://hdl.handle.net/11511/46018 |
| UnpaywallVersion | submittedVersion |
| Volume | 45 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 2168-2275 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000816898 issn: 2168-2267 databaseCode: RIE dateStart: 20130101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB615QAcgFIe4VEZiQOvbLOOY6-PS0VVIVEqtCu1p8h2JhxYJRVNhJZfz9jxRhQqxM2K7MjRzMSf5_ENwMuaIL7R04IsTehU8BpTrYxM89y5uvYlCSawfZ7I46X4eFacbcG7sRYGEUPyGU78MMTyq9b13lV2IGdaFEptw7aayaFWa_SnhAYSofUtp0FKqELFIOY00weLw_P3Po9LTDghjiz3bfo4nVxSaHHlRAotVq5Dm7fhZt9cmPUPs1r9dgId3YVPm70PiSffJn1nJ-7nH7SO__tx9-BOhKJsPujOLmxhcx92o7FfsleRkfr1HtSLkFzL4pNYusnams37riXMixVbDBxX9D7rvSehXIJ1LTv1uklfs2afbfAA2xWyLxgIW13wTbLI8fr1ASyPPiwOj9PYoCF1RSa61NIPgQBFxYUjIGNcgeiUcFMla2F95ICGaHgljSEgMPPgxpE4cpSyoulZ_hB2mrbBx8CsB5pKm0pkQnCVWc_6Q7dTgwSITCUTyDZCKl1kL_dNNFZluMVkuvQiLr2IyyjiBN6MSy4G6o5_Td7zIhknRmkk8GKjCSXZnQ-mmAbb_rKcSq049y17Eng0qMi4eKNZCbwddeavLXRuba9s4cn1W3gKt2hWMSQdPoOd7nuPzwkIdXY_WMA-3FienM7PfwEITwFQ |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3fT9UwFD5BfEAfQAR1olATHxTYZbfrWvqIRHJRQGNGAk9L23U-eLMR2GKuf72nXe8iSohvzdIuXc4569fz4zsAbyuE-EqOM7Q0JmNGKxtLoXicpsZUlStJUJ7t84xPztmni-xiAXaHWhhrrU8-syM39LH8sjGdc5Xt8X3JMiEewMOMMZb11VqDR8W3kPDNbykOYsQVIoQxx4ncyw8vP7hMLjaiiDmS1DXqo3h2cSbZrTPJN1m5C28-hqWuvlKzn2o6_eMMOlqB0_nu-9STH6Ou1SPz6y9ix__9vCewHMAoOei1ZxUWbP0UVoO535B3gZP6_RpUuU-vJeFJKN4kTUUOurZB1GtLkvcsV_g-7fwnvmCCtA356rQTv2ZGvmjvA9ZTS75ZT9lqvHeSBJbX7-twfvQxP5zEoUVDbLKEtbHGXwJCipIyg1BGmcxaI5gZC14x7WIHOLSKllwphAL7Dt4YFEdqOS9xepI-g8W6qe0LINpBTSFVyRLGqEi04_3B-6myCIlUySNI5kIqTOAvd200poW_xySycCIunIiLIOIItoclVz15x32T15xIholBGhG8mWtCgZbnwimqtk13U4y5FJS6pj0RPO9VZFg816wIdgad-WcLrZnpW1t4efcWtmBpkp-eFCfHZ5834BGuyPoUxFew2F539jXColZvemv4DYvrAu0 |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB7R5UB7KKX0EaDIrTi0RdlNvI69Pm5XRYgDRWhXglNkO04PXSWoJKq2v56x413xEhI3JxpHdmZG_mzPfANwUCLEVzLN0NOYjBktbSyF4vFwaExZupQE5dk-T_nxjJ1cZBdr8GWZC3OPXwDxSpoOGO4aRi9gnWeIt3uwPjs9G1-6qnEpRzVTXyY2tEUWri7TRA4as9Aueov1KeKMxOWg3Fp8fDWVx4DlK9hoqyu1-Kfm81uLzdEmTJbD7GJM_vTbRvfN_3sMjk_P4w28DliTjDvj2II1W72FreDN1-RroJz-tg3l1EfPkvAm5GaSuiTjtqkR1NqCTDsSK_yedscjPh-CNDU5c8aHc1iQX9of8eq5JefWM7Iaf_hIAonr73cwO_o5nRzHoQJDbLKENbFGj0fEUFBmEKkok1lrBDOp4CXT7moAm1bRgiuFK_3IoReDf35oOS9QPBm-h15VV_YjEO2QpJCqYAljVCTa0frg9lNZRDyq4BEkS9XkJtCTuyoZ89xvUxKZTyeXP3KnzTxoM4Lvqy5XHTfHU8LbTt8rQT6SLBMigs9L_efoWO62RFW2bq_zlEtBqavJE8GHzjBWnSmCHs4ki-BwZSkPhuCs784Qdp4lvQsv8THrgg33oNf8be0nBECN3g8ecAOs0_vY |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Toward+Generalization+of+Automated+Temporal+Abstraction+to+Partially+Observable+Reinforcement+Learning&rft.jtitle=IEEE+transactions+on+cybernetics&rft.au=Cilden%2C+Erkin&rft.au=Polat%2C+Faruk&rft.date=2015-08-01&rft.issn=2168-2267&rft.eissn=2168-2275&rft.volume=45&rft.issue=8&rft.spage=1414&rft.epage=1425&rft_id=info:doi/10.1109%2FTCYB.2014.2352038&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TCYB_2014_2352038 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2168-2267&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2168-2267&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2168-2267&client=summon |