Toward Generalization of Automated Temporal Abstraction to Partially Observable Reinforcement Learning

Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Variou...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on cybernetics Vol. 45; no. 8; pp. 1414 - 1425
Main Authors	Cilden, Erkin, Polat, Faruk
Format	Journal Article
Language	English
Published	United States IEEE 01.08.2015
Subjects	Approximation algorithms Approximation methods Entropy History Learning (artificial intelligence) Learning abstractions Mathematical model partially observable Markov decision process (POMDP) reinforcement learning (RL) Vectors partially observable Markov decision process (POMDP) reinforcement learning (RL) Learning abstractions
Online Access	Get full text
ISSN	2168-2267 2168-2275 2168-2275
DOI	10.1109/TCYB.2014.2352038

Cover

Abstract	Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Various studies have explored the subject under the assumption that the problem domain is fully observable by the learning agent. Learning abstractions for partially observable RL is a relatively less explored area. In this paper, we adapt an existing automatic abstraction method, namely extended sequence tree, originally designed for fully observable problems. The modified method covers a certain family of model-based partially observable RL settings. We also introduce belief state discretization methods that can be used with this new abstraction mechanism. The effectiveness of the proposed abstraction method is shown empirically by experimenting on well-known benchmark problems.
AbstractList	Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Various studies have explored the subject under the assumption that the problem domain is fully observable by the learning agent. Learning abstractions for partially observable RL is a relatively less explored area. In this paper, we adapt an existing automatic abstraction method, namely extended sequence tree, originally designed for fully observable problems. The modified method covers a certain family of model-based partially observable RL settings. We also introduce belief state discretization methods that can be used with this new abstraction mechanism. The effectiveness of the proposed abstraction method is shown empirically by experimenting on well-known benchmark problems. Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Various studies have explored the subject under the assumption that the problem domain is fully observable by the learning agent. Learning abstractions for partially observable RL is a relatively less explored area. In this paper, we adapt an existing automatic abstraction method, namely extended sequence tree, originally designed for fully observable problems. The modified method covers a certain family of model-based partially observable RL settings. We also introduce belief state discretization methods that can be used with this new abstraction mechanism. The effectiveness of the proposed abstraction method is shown empirically by experimenting on well-known benchmark problems.Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Various studies have explored the subject under the assumption that the problem domain is fully observable by the learning agent. Learning abstractions for partially observable RL is a relatively less explored area. In this paper, we adapt an existing automatic abstraction method, namely extended sequence tree, originally designed for fully observable problems. The modified method covers a certain family of model-based partially observable RL settings. We also introduce belief state discretization methods that can be used with this new abstraction mechanism. The effectiveness of the proposed abstraction method is shown empirically by experimenting on well-known benchmark problems.
Author	Polat, Faruk Cilden, Erkin
Author_xml	– sequence: 1 givenname: Erkin surname: Cilden fullname: Cilden, Erkin email: ecilden@ceng.metu.edu.tr organization: Department of Computer Engineering, Middle East Technical University, Ankara, Turkey – sequence: 2 givenname: Faruk surname: Polat fullname: Polat, Faruk email: polat@ceng.metu.edu.tr organization: Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/25216494$$D View this record in MEDLINE/PubMed
BookMark	eNp9UcFqGzEUFCWhTlN_QCkUHXOxI2m12t2ja5o0YEgI7qGn5a32bVHZlVxJ2-B8feTY8SGB6KKHZuaNmPlETqyzSMgXzuacs-pyvfz9fS4Yl3OR5YJl5QdyJrgqZ0IU-clxVsWETEP4y9Ip01NVfiQTkSdUVvKMdGv3AL6l12jRQ28eIRpnqevoYoxugIgtXeOwcQmkiyZED_qZER29Ax8N9P2W3jYB_X9oeqT3aGznvMYBbaQrBG-N_fOZnHbQB5we7nPy6-rHevlztrq9vlkuVjOdMxlnTcZLXrFWSK3yEnSOqAupeaE6mTDG0oggWgUghSqVynKdwshQqTbRWXZOxH7vaDewfUifqzfeDOC3NWf1Lrc66m1T73KrD7kl0cVetPHu34gh1oMJGvseLLox1FxVhRBMlTvqtwN1bAZsj8tfAk0Evido70Lw2L3x3_X22r94pdEmPveQ0jb9u8qve6VBxKNTKlnmRZE9ATTpo8I
CODEN	ITCEB8
CitedBy_id	crossref_primary_10_1109_TCYB_2021_3079149 crossref_primary_10_1007_s13042_022_01713_5 crossref_primary_10_1016_j_robot_2017_09_001 crossref_primary_10_1016_j_neucom_2024_128797 crossref_primary_10_1016_j_ins_2022_07_052 crossref_primary_10_1109_TCYB_2021_3102510 crossref_primary_10_1109_TCYB_2021_3107202
Cites_doi	10.1007/3-540-45622-8_16 10.1023/A:1025696116075 10.1109/3477.846230 10.1109/TSMCB.2007.899419 10.1016/S0004-3702(98)00023-X 10.1109/SICE.2007.4421430 10.1016/S0004-3702(99)00052-1 10.1007/BF00115009 10.1613/jair.301 10.1109/3477.499796 10.1007/s10994-010-5182-y 10.1287/opre.39.1.162 10.1613/jair.639 10.1109/IROS.1996.571080
ContentType	Journal Article
DBID	97E RIA RIE AAYXX CITATION NPM 7X8 ADTOC UNPAY
DOI	10.1109/TCYB.2014.2352038
DatabaseName	IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef PubMed MEDLINE - Academic Unpaywall for CDI: Periodical Content Unpaywall
DatabaseTitle	CrossRef PubMed MEDLINE - Academic
DatabaseTitleList	PubMed MEDLINE - Academic
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher – sequence: 3 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Sciences (General)
EISSN	2168-2275
EndPage	1425
ExternalDocumentID	oai:https://open.metu.edu.tr:11511/46018 25216494 10_1109_TCYB_2014_2352038 6894577
Genre	orig-research Research Support, Non-U.S. Gov't Journal Article
GrantInformation_xml	– fundername: Türkiye Bilimsel ve Teknolojik Araştirma Kurumu; Scientific and Technological Research Council of Turkey grantid: 113E239 funderid: 10.13039/501100004410
GroupedDBID	0R~ 4.4 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACIWK AENEX AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD HZ~ IFIPE IPLJI JAVBF M43 O9- OCL PQQKQ RIA RIE RNS AAYXX CITATION NPM RIG 7X8 ADTOC UNPAY
ID	FETCH-LOGICAL-c504t-b318190d24c658ac5eec74c176f4b3100c17ea2d6aa42686635c1103e66d8ac03
IEDL.DBID	RIE
ISSN	2168-2267 2168-2275
IngestDate	Sun Oct 26 04:07:21 EDT 2025 Sun Sep 28 09:22:07 EDT 2025 Thu Apr 03 07:04:03 EDT 2025 Wed Oct 01 05:14:31 EDT 2025 Thu Apr 24 23:10:57 EDT 2025 Wed Aug 27 08:36:58 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	8
Keywords	partially observable Markov decision process (POMDP) reinforcement learning (RL) Learning abstractions
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html cc-by-nc-nd
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c504t-b318190d24c658ac5eec74c176f4b3100c17ea2d6aa42686635c1103e66d8ac03
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
OpenAccessLink	https://proxy.k.utb.cz/login?url=https://hdl.handle.net/11511/46018
PMID	25216494
PQID	1697220688
PQPubID	23479
PageCount	12
ParticipantIDs	proquest_miscellaneous_1697220688 crossref_primary_10_1109_TCYB_2014_2352038 crossref_citationtrail_10_1109_TCYB_2014_2352038 ieee_primary_6894577 pubmed_primary_25216494 unpaywall_primary_10_1109_tcyb_2014_2352038
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2015-08-01
PublicationDateYYYYMMDD	2015-08-01
PublicationDate_xml	– month: 08 year: 2015 text: 2015-08-01 day: 01
PublicationDecade	2010
PublicationPlace	United States
PublicationPlace_xml	– name: United States
PublicationTitle	IEEE transactions on cybernetics
PublicationTitleAbbrev	TCYB
PublicationTitleAlternate	IEEE Trans Cybern
PublicationYear	2015
Publisher	IEEE
Publisher_xml	– name: IEEE
References	ref13 sutton (ref1) 1998 ref15 ref30 bradtke (ref17) 1994; 7 (ref31) 2012; 2 ref10 mcgovern (ref3) 2001 parr (ref37) 1995; 2 pineau (ref33) 2004 cassandra (ref34) 1998 ref19 menache (ref5) 2002 theocharous (ref12) 2004; 16 chrisman (ref21) 1992 hengst (ref2) 2002 watkins (ref16) 1989 mccallum (ref22) 1996 ref23 ref20 smith (ref35) 2004 kaelbling (ref11) 1996; 4 lin (ref24) 1993 bellman (ref14) 1957 im?ek (ref6) 2004 dietterich (ref36) 2000; 13 ref27 roy (ref32) 2003 ref29 ref8 parr (ref18) 1998 ref9 ref4 mcgovern (ref7) 1998 zhou (ref28) 2001; 1 littman (ref25) 1998 charlin (ref26) 2007; 19
References_xml	– start-page: 520 year: 2004 ident: ref35 article-title: Heuristic search value iteration for POMDPs publication-title: Proc 20th Conf Uncertainty Artif Intell – year: 1989 ident: ref16 article-title: Learning from delayed rewards – ident: ref4 doi: 10.1007/3-540-45622-8_16 – start-page: 95 year: 2004 ident: ref6 article-title: Using relative novelty to identify useful temporal abstractions in reinforcement learning publication-title: Proc 21st Int Conf Mach Learn – ident: ref20 doi: 10.1023/A:1025696116075 – ident: ref27 doi: 10.1109/3477.846230 – year: 1996 ident: ref22 article-title: Reinforcement learning with selective perception and hidden state – ident: ref8 doi: 10.1109/TSMCB.2007.899419 – volume: 16 start-page: 775 year: 2004 ident: ref12 article-title: Approximate planning in POMDPs with macro-actions publication-title: Proc Adv Neural Inf Process Syst – volume: 2 start-page: 1088 year: 1995 ident: ref37 article-title: Approximating optimal policies for partially observable stochastic domains publication-title: Proc 14th Int Joint Conf Artif Intell – ident: ref10 doi: 10.1016/S0004-3702(98)00023-X – volume: 19 start-page: 225 year: 2007 ident: ref26 article-title: Automated hierarchy discovery for planning in partially observable environments publication-title: Proc Adv Neural Inf Process Syst – ident: ref13 doi: 10.1109/SICE.2007.4421430 – ident: ref19 doi: 10.1016/S0004-3702(99)00052-1 – volume: 7 start-page: 393 year: 1994 ident: ref17 article-title: Reinforcement learning methods for continuous-time Markov decision problems publication-title: Proc Adv Neural Inf Process Syst – year: 1998 ident: ref18 article-title: Hierarchical control and learning for Markov decision processes – ident: ref15 doi: 10.1007/BF00115009 – year: 2003 ident: ref32 article-title: Finding approximate POMDP solutions through belief compression – volume: 4 start-page: 237 year: 1996 ident: ref11 article-title: Reinforcement learning: A survey publication-title: J Artif Intell Res doi: 10.1613/jair.301 – year: 1998 ident: ref7 article-title: acQuire-macros: An algorithm for automatically learning macro-actions publication-title: Proc Neural Inf Process Syst Conf Workshop Abstraction Hierarchy Reinforcement Learn – ident: ref23 doi: 10.1109/3477.499796 – year: 1993 ident: ref24 article-title: Reinforcement learning for robots using neural networks – volume: 1 start-page: 707 year: 2001 ident: ref28 article-title: An improved grid-based approximation algorithm for POMDPs publication-title: Proc 17th Int Joint Conf Artif Intell – volume: 2 start-page: 348 year: 2012 ident: ref31 article-title: Abstraction in model based partially observable reinforcement learning using extended sequence trees publication-title: Proc IEEE/WIC/ACM Int Conf Web Intell Intell Agent Technol – ident: ref9 doi: 10.1007/s10994-010-5182-y – start-page: 495 year: 1998 ident: ref25 article-title: Learning policies for partially observable environments: Scaling up publication-title: Readings in Agents – year: 1998 ident: ref1 publication-title: Reinforcement Learning An Introduction – ident: ref29 doi: 10.1287/opre.39.1.162 – start-page: 295 year: 2002 ident: ref5 article-title: Q-cut-Dynamic discovery of sub-goals in reinforcement learning publication-title: Proc 13th Eur Conf Mach Learn – volume: 13 start-page: 227 year: 2000 ident: ref36 article-title: Hierarchical reinforcement learning with the MAXQ value function decomposition publication-title: J Artif Intell Res doi: 10.1613/jair.639 – start-page: 361 year: 2001 ident: ref3 article-title: Automatic discovery of subgoals in reinforcement learning using diverse density publication-title: Proc 18th Int Conf Mach Learn – ident: ref30 doi: 10.1109/IROS.1996.571080 – start-page: 183 year: 1992 ident: ref21 article-title: Reinforcement learning with perceptual aliasing: The perceptual distinctions approach publication-title: Proc Nat Conf Artif Intell – year: 2004 ident: ref33 article-title: Tractable planning under uncertainty: Exploiting structure – year: 1998 ident: ref34 article-title: Exact and approximate algorithms for partially observable Markov decision processes – year: 1957 ident: ref14 publication-title: Dynamic Programming – start-page: 243 year: 2002 ident: ref2 article-title: Discovering hierarchy in reinforcement learning with HEXQ publication-title: Proc 19th Int Conf Mach Learn
SSID	ssj0000816898
Score	2.1247952
Snippet	Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task....
SourceID	unpaywall proquest pubmed crossref ieee
SourceType	Open Access Repository Aggregation Database Index Database Enrichment Source Publisher
StartPage	1414
SubjectTerms	Approximation algorithms Approximation methods Entropy History Learning (artificial intelligence) Learning abstractions Mathematical model partially observable Markov decision process (POMDP) reinforcement learning (RL) Vectors
SummonAdditionalLinks	– databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB7R5UB7KKX0EaDIrTi0RdlNvI69Pm5XRYgDRWhXglNkO04PXSWoJKq2v56x413xEhI3JxpHdmZG_mzPfANwUCLEVzLN0NOYjBktbSyF4vFwaExZupQE5dk-T_nxjJ1cZBdr8GWZC3OPXwDxSpoOGO4aRi9gnWeIt3uwPjs9G1-6qnEpRzVTXyY2tEUWri7TRA4as9Aueov1KeKMxOWg3Fp8fDWVx4DlK9hoqyu1-Kfm81uLzdEmTJbD7GJM_vTbRvfN_3sMjk_P4w28DliTjDvj2II1W72FreDN1-RroJz-tg3l1EfPkvAm5GaSuiTjtqkR1NqCTDsSK_yedscjPh-CNDU5c8aHc1iQX9of8eq5JefWM7Iaf_hIAonr73cwO_o5nRzHoQJDbLKENbFGj0fEUFBmEKkok1lrBDOp4CXT7moAm1bRgiuFK_3IoReDf35oOS9QPBm-h15VV_YjEO2QpJCqYAljVCTa0frg9lNZRDyq4BEkS9XkJtCTuyoZ89xvUxKZTyeXP3KnzTxoM4Lvqy5XHTfHU8LbTt8rQT6SLBMigs9L_efoWO62RFW2bq_zlEtBqavJE8GHzjBWnSmCHs4ki-BwZSkPhuCs784Qdp4lvQsv8THrgg33oNf8be0nBECN3g8ecAOs0_vY priority: 102 providerName: Unpaywall
Title	Toward Generalization of Automated Temporal Abstraction to Partially Observable Reinforcement Learning
URI	https://ieeexplore.ieee.org/document/6894577 https://www.ncbi.nlm.nih.gov/pubmed/25216494 https://www.proquest.com/docview/1697220688 https://hdl.handle.net/11511/46018
UnpaywallVersion	submittedVersion
Volume	45
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 2168-2275 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000816898 issn: 2168-2267 databaseCode: RIE dateStart: 20130101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB615QAcgFIe4VEZiQOvbLOOY6-PS0VVIVEqtCu1p8h2JhxYJRVNhJZfz9jxRhQqxM2K7MjRzMSf5_ENwMuaIL7R04IsTehU8BpTrYxM89y5uvYlCSawfZ7I46X4eFacbcG7sRYGEUPyGU78MMTyq9b13lV2IGdaFEptw7aayaFWa_SnhAYSofUtp0FKqELFIOY00weLw_P3Po9LTDghjiz3bfo4nVxSaHHlRAotVq5Dm7fhZt9cmPUPs1r9dgId3YVPm70PiSffJn1nJ-7nH7SO__tx9-BOhKJsPujOLmxhcx92o7FfsleRkfr1HtSLkFzL4pNYusnams37riXMixVbDBxX9D7rvSehXIJ1LTv1uklfs2afbfAA2xWyLxgIW13wTbLI8fr1ASyPPiwOj9PYoCF1RSa61NIPgQBFxYUjIGNcgeiUcFMla2F95ICGaHgljSEgMPPgxpE4cpSyoulZ_hB2mrbBx8CsB5pKm0pkQnCVWc_6Q7dTgwSITCUTyDZCKl1kL_dNNFZluMVkuvQiLr2IyyjiBN6MSy4G6o5_Td7zIhknRmkk8GKjCSXZnQ-mmAbb_rKcSq049y17Eng0qMi4eKNZCbwddeavLXRuba9s4cn1W3gKt2hWMSQdPoOd7nuPzwkIdXY_WMA-3FienM7PfwEITwFQ
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3fT9UwFD5BfEAfQAR1olATHxTYZbfrWvqIRHJRQGNGAk9L23U-eLMR2GKuf72nXe8iSohvzdIuXc4569fz4zsAbyuE-EqOM7Q0JmNGKxtLoXicpsZUlStJUJ7t84xPztmni-xiAXaHWhhrrU8-syM39LH8sjGdc5Xt8X3JMiEewMOMMZb11VqDR8W3kPDNbykOYsQVIoQxx4ncyw8vP7hMLjaiiDmS1DXqo3h2cSbZrTPJN1m5C28-hqWuvlKzn2o6_eMMOlqB0_nu-9STH6Ou1SPz6y9ix__9vCewHMAoOei1ZxUWbP0UVoO535B3gZP6_RpUuU-vJeFJKN4kTUUOurZB1GtLkvcsV_g-7fwnvmCCtA356rQTv2ZGvmjvA9ZTS75ZT9lqvHeSBJbX7-twfvQxP5zEoUVDbLKEtbHGXwJCipIyg1BGmcxaI5gZC14x7WIHOLSKllwphAL7Dt4YFEdqOS9xepI-g8W6qe0LINpBTSFVyRLGqEi04_3B-6myCIlUySNI5kIqTOAvd200poW_xySycCIunIiLIOIItoclVz15x32T15xIholBGhG8mWtCgZbnwimqtk13U4y5FJS6pj0RPO9VZFg816wIdgad-WcLrZnpW1t4efcWtmBpkp-eFCfHZ5834BGuyPoUxFew2F539jXColZvemv4DYvrAu0
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB7R5UB7KKX0EaDIrTi0RdlNvI69Pm5XRYgDRWhXglNkO04PXSWoJKq2v56x413xEhI3JxpHdmZG_mzPfANwUCLEVzLN0NOYjBktbSyF4vFwaExZupQE5dk-T_nxjJ1cZBdr8GWZC3OPXwDxSpoOGO4aRi9gnWeIt3uwPjs9G1-6qnEpRzVTXyY2tEUWri7TRA4as9Aueov1KeKMxOWg3Fp8fDWVx4DlK9hoqyu1-Kfm81uLzdEmTJbD7GJM_vTbRvfN_3sMjk_P4w28DliTjDvj2II1W72FreDN1-RroJz-tg3l1EfPkvAm5GaSuiTjtqkR1NqCTDsSK_yedscjPh-CNDU5c8aHc1iQX9of8eq5JefWM7Iaf_hIAonr73cwO_o5nRzHoQJDbLKENbFGj0fEUFBmEKkok1lrBDOp4CXT7moAm1bRgiuFK_3IoReDf35oOS9QPBm-h15VV_YjEO2QpJCqYAljVCTa0frg9lNZRDyq4BEkS9XkJtCTuyoZ89xvUxKZTyeXP3KnzTxoM4Lvqy5XHTfHU8LbTt8rQT6SLBMigs9L_efoWO62RFW2bq_zlEtBqavJE8GHzjBWnSmCHs4ki-BwZSkPhuCs784Qdp4lvQsv8THrgg33oNf8be0nBECN3g8ecAOs0_vY
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Toward+Generalization+of+Automated+Temporal+Abstraction+to+Partially+Observable+Reinforcement+Learning&rft.jtitle=IEEE+transactions+on+cybernetics&rft.au=Cilden%2C+Erkin&rft.au=Polat%2C+Faruk&rft.date=2015-08-01&rft.issn=2168-2267&rft.eissn=2168-2275&rft.volume=45&rft.issue=8&rft.spage=1414&rft.epage=1425&rft_id=info:doi/10.1109%2FTCYB.2014.2352038&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TCYB_2014_2352038
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2168-2267&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2168-2267&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2168-2267&client=summon