Toward Generalization of Automated Temporal Abstraction to Partially Observable Reinforcement Learning

Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Variou...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on cybernetics Vol. 45; no. 8; pp. 1414 - 1425
Main Authors Cilden, Erkin, Polat, Faruk
Format Journal Article
LanguageEnglish
Published United States IEEE 01.08.2015
Subjects
Online AccessGet full text
ISSN2168-2267
2168-2275
2168-2275
DOI10.1109/TCYB.2014.2352038

Cover

Abstract Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Various studies have explored the subject under the assumption that the problem domain is fully observable by the learning agent. Learning abstractions for partially observable RL is a relatively less explored area. In this paper, we adapt an existing automatic abstraction method, namely extended sequence tree, originally designed for fully observable problems. The modified method covers a certain family of model-based partially observable RL settings. We also introduce belief state discretization methods that can be used with this new abstraction mechanism. The effectiveness of the proposed abstraction method is shown empirically by experimenting on well-known benchmark problems.
AbstractList Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Various studies have explored the subject under the assumption that the problem domain is fully observable by the learning agent. Learning abstractions for partially observable RL is a relatively less explored area. In this paper, we adapt an existing automatic abstraction method, namely extended sequence tree, originally designed for fully observable problems. The modified method covers a certain family of model-based partially observable RL settings. We also introduce belief state discretization methods that can be used with this new abstraction mechanism. The effectiveness of the proposed abstraction method is shown empirically by experimenting on well-known benchmark problems.
Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Various studies have explored the subject under the assumption that the problem domain is fully observable by the learning agent. Learning abstractions for partially observable RL is a relatively less explored area. In this paper, we adapt an existing automatic abstraction method, namely extended sequence tree, originally designed for fully observable problems. The modified method covers a certain family of model-based partially observable RL settings. We also introduce belief state discretization methods that can be used with this new abstraction mechanism. The effectiveness of the proposed abstraction method is shown empirically by experimenting on well-known benchmark problems.Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Various studies have explored the subject under the assumption that the problem domain is fully observable by the learning agent. Learning abstractions for partially observable RL is a relatively less explored area. In this paper, we adapt an existing automatic abstraction method, namely extended sequence tree, originally designed for fully observable problems. The modified method covers a certain family of model-based partially observable RL settings. We also introduce belief state discretization methods that can be used with this new abstraction mechanism. The effectiveness of the proposed abstraction method is shown empirically by experimenting on well-known benchmark problems.
Author Polat, Faruk
Cilden, Erkin
Author_xml – sequence: 1
  givenname: Erkin
  surname: Cilden
  fullname: Cilden, Erkin
  email: ecilden@ceng.metu.edu.tr
  organization: Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
– sequence: 2
  givenname: Faruk
  surname: Polat
  fullname: Polat, Faruk
  email: polat@ceng.metu.edu.tr
  organization: Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
BackLink https://www.ncbi.nlm.nih.gov/pubmed/25216494$$D View this record in MEDLINE/PubMed
BookMark eNp9UcFqGzEUFCWhTlN_QCkUHXOxI2m12t2ja5o0YEgI7qGn5a32bVHZlVxJ2-B8feTY8SGB6KKHZuaNmPlETqyzSMgXzuacs-pyvfz9fS4Yl3OR5YJl5QdyJrgqZ0IU-clxVsWETEP4y9Ip01NVfiQTkSdUVvKMdGv3AL6l12jRQ28eIRpnqevoYoxugIgtXeOwcQmkiyZED_qZER29Ax8N9P2W3jYB_X9oeqT3aGznvMYBbaQrBG-N_fOZnHbQB5we7nPy6-rHevlztrq9vlkuVjOdMxlnTcZLXrFWSK3yEnSOqAupeaE6mTDG0oggWgUghSqVynKdwshQqTbRWXZOxH7vaDewfUifqzfeDOC3NWf1Lrc66m1T73KrD7kl0cVetPHu34gh1oMJGvseLLox1FxVhRBMlTvqtwN1bAZsj8tfAk0Evido70Lw2L3x3_X22r94pdEmPveQ0jb9u8qve6VBxKNTKlnmRZE9ATTpo8I
CODEN ITCEB8
CitedBy_id crossref_primary_10_1109_TCYB_2021_3079149
crossref_primary_10_1007_s13042_022_01713_5
crossref_primary_10_1016_j_robot_2017_09_001
crossref_primary_10_1016_j_neucom_2024_128797
crossref_primary_10_1016_j_ins_2022_07_052
crossref_primary_10_1109_TCYB_2021_3102510
crossref_primary_10_1109_TCYB_2021_3107202
Cites_doi 10.1007/3-540-45622-8_16
10.1023/A:1025696116075
10.1109/3477.846230
10.1109/TSMCB.2007.899419
10.1016/S0004-3702(98)00023-X
10.1109/SICE.2007.4421430
10.1016/S0004-3702(99)00052-1
10.1007/BF00115009
10.1613/jair.301
10.1109/3477.499796
10.1007/s10994-010-5182-y
10.1287/opre.39.1.162
10.1613/jair.639
10.1109/IROS.1996.571080
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
NPM
7X8
ADTOC
UNPAY
DOI 10.1109/TCYB.2014.2352038
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
PubMed
MEDLINE - Academic
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
PubMed
MEDLINE - Academic
DatabaseTitleList
PubMed
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
– sequence: 3
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Sciences (General)
EISSN 2168-2275
EndPage 1425
ExternalDocumentID oai:https://open.metu.edu.tr:11511/46018
25216494
10_1109_TCYB_2014_2352038
6894577
Genre orig-research
Research Support, Non-U.S. Gov't
Journal Article
GrantInformation_xml – fundername: Türkiye Bilimsel ve Teknolojik Araştirma Kurumu; Scientific and Technological Research Council of Turkey
  grantid: 113E239
  funderid: 10.13039/501100004410
GroupedDBID 0R~
4.4
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACIWK
AENEX
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
HZ~
IFIPE
IPLJI
JAVBF
M43
O9-
OCL
PQQKQ
RIA
RIE
RNS
AAYXX
CITATION
NPM
RIG
7X8
ADTOC
UNPAY
ID FETCH-LOGICAL-c504t-b318190d24c658ac5eec74c176f4b3100c17ea2d6aa42686635c1103e66d8ac03
IEDL.DBID RIE
ISSN 2168-2267
2168-2275
IngestDate Sun Oct 26 04:07:21 EDT 2025
Sun Sep 28 09:22:07 EDT 2025
Thu Apr 03 07:04:03 EDT 2025
Wed Oct 01 05:14:31 EDT 2025
Thu Apr 24 23:10:57 EDT 2025
Wed Aug 27 08:36:58 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 8
Keywords partially observable Markov decision process (POMDP)
reinforcement learning (RL)
Learning abstractions
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
cc-by-nc-nd
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c504t-b318190d24c658ac5eec74c176f4b3100c17ea2d6aa42686635c1103e66d8ac03
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://proxy.k.utb.cz/login?url=https://hdl.handle.net/11511/46018
PMID 25216494
PQID 1697220688
PQPubID 23479
PageCount 12
ParticipantIDs proquest_miscellaneous_1697220688
crossref_primary_10_1109_TCYB_2014_2352038
crossref_citationtrail_10_1109_TCYB_2014_2352038
ieee_primary_6894577
pubmed_primary_25216494
unpaywall_primary_10_1109_tcyb_2014_2352038
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2015-08-01
PublicationDateYYYYMMDD 2015-08-01
PublicationDate_xml – month: 08
  year: 2015
  text: 2015-08-01
  day: 01
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle IEEE transactions on cybernetics
PublicationTitleAbbrev TCYB
PublicationTitleAlternate IEEE Trans Cybern
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
References ref13
sutton (ref1) 1998
ref15
ref30
bradtke (ref17) 1994; 7
(ref31) 2012; 2
ref10
mcgovern (ref3) 2001
parr (ref37) 1995; 2
pineau (ref33) 2004
cassandra (ref34) 1998
ref19
menache (ref5) 2002
theocharous (ref12) 2004; 16
chrisman (ref21) 1992
hengst (ref2) 2002
watkins (ref16) 1989
mccallum (ref22) 1996
ref23
ref20
smith (ref35) 2004
kaelbling (ref11) 1996; 4
lin (ref24) 1993
bellman (ref14) 1957
im?ek (ref6) 2004
dietterich (ref36) 2000; 13
ref27
roy (ref32) 2003
ref29
ref8
parr (ref18) 1998
ref9
ref4
mcgovern (ref7) 1998
zhou (ref28) 2001; 1
littman (ref25) 1998
charlin (ref26) 2007; 19
References_xml – start-page: 520
  year: 2004
  ident: ref35
  article-title: Heuristic search value iteration for POMDPs
  publication-title: Proc 20th Conf Uncertainty Artif Intell
– year: 1989
  ident: ref16
  article-title: Learning from delayed rewards
– ident: ref4
  doi: 10.1007/3-540-45622-8_16
– start-page: 95
  year: 2004
  ident: ref6
  article-title: Using relative novelty to identify useful temporal abstractions in reinforcement learning
  publication-title: Proc 21st Int Conf Mach Learn
– ident: ref20
  doi: 10.1023/A:1025696116075
– ident: ref27
  doi: 10.1109/3477.846230
– year: 1996
  ident: ref22
  article-title: Reinforcement learning with selective perception and hidden state
– ident: ref8
  doi: 10.1109/TSMCB.2007.899419
– volume: 16
  start-page: 775
  year: 2004
  ident: ref12
  article-title: Approximate planning in POMDPs with macro-actions
  publication-title: Proc Adv Neural Inf Process Syst
– volume: 2
  start-page: 1088
  year: 1995
  ident: ref37
  article-title: Approximating optimal policies for partially observable stochastic domains
  publication-title: Proc 14th Int Joint Conf Artif Intell
– ident: ref10
  doi: 10.1016/S0004-3702(98)00023-X
– volume: 19
  start-page: 225
  year: 2007
  ident: ref26
  article-title: Automated hierarchy discovery for planning in partially observable environments
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref13
  doi: 10.1109/SICE.2007.4421430
– ident: ref19
  doi: 10.1016/S0004-3702(99)00052-1
– volume: 7
  start-page: 393
  year: 1994
  ident: ref17
  article-title: Reinforcement learning methods for continuous-time Markov decision problems
  publication-title: Proc Adv Neural Inf Process Syst
– year: 1998
  ident: ref18
  article-title: Hierarchical control and learning for Markov decision processes
– ident: ref15
  doi: 10.1007/BF00115009
– year: 2003
  ident: ref32
  article-title: Finding approximate POMDP solutions through belief compression
– volume: 4
  start-page: 237
  year: 1996
  ident: ref11
  article-title: Reinforcement learning: A survey
  publication-title: J Artif Intell Res
  doi: 10.1613/jair.301
– year: 1998
  ident: ref7
  article-title: acQuire-macros: An algorithm for automatically learning macro-actions
  publication-title: Proc Neural Inf Process Syst Conf Workshop Abstraction Hierarchy Reinforcement Learn
– ident: ref23
  doi: 10.1109/3477.499796
– year: 1993
  ident: ref24
  article-title: Reinforcement learning for robots using neural networks
– volume: 1
  start-page: 707
  year: 2001
  ident: ref28
  article-title: An improved grid-based approximation algorithm for POMDPs
  publication-title: Proc 17th Int Joint Conf Artif Intell
– volume: 2
  start-page: 348
  year: 2012
  ident: ref31
  article-title: Abstraction in model based partially observable reinforcement learning using extended sequence trees
  publication-title: Proc IEEE/WIC/ACM Int Conf Web Intell Intell Agent Technol
– ident: ref9
  doi: 10.1007/s10994-010-5182-y
– start-page: 495
  year: 1998
  ident: ref25
  article-title: Learning policies for partially observable environments: Scaling up
  publication-title: Readings in Agents
– year: 1998
  ident: ref1
  publication-title: Reinforcement Learning An Introduction
– ident: ref29
  doi: 10.1287/opre.39.1.162
– start-page: 295
  year: 2002
  ident: ref5
  article-title: Q-cut-Dynamic discovery of sub-goals in reinforcement learning
  publication-title: Proc 13th Eur Conf Mach Learn
– volume: 13
  start-page: 227
  year: 2000
  ident: ref36
  article-title: Hierarchical reinforcement learning with the MAXQ value function decomposition
  publication-title: J Artif Intell Res
  doi: 10.1613/jair.639
– start-page: 361
  year: 2001
  ident: ref3
  article-title: Automatic discovery of subgoals in reinforcement learning using diverse density
  publication-title: Proc 18th Int Conf Mach Learn
– ident: ref30
  doi: 10.1109/IROS.1996.571080
– start-page: 183
  year: 1992
  ident: ref21
  article-title: Reinforcement learning with perceptual aliasing: The perceptual distinctions approach
  publication-title: Proc Nat Conf Artif Intell
– year: 2004
  ident: ref33
  article-title: Tractable planning under uncertainty: Exploiting structure
– year: 1998
  ident: ref34
  article-title: Exact and approximate algorithms for partially observable Markov decision processes
– year: 1957
  ident: ref14
  publication-title: Dynamic Programming
– start-page: 243
  year: 2002
  ident: ref2
  article-title: Discovering hierarchy in reinforcement learning with HEXQ
  publication-title: Proc 19th Int Conf Mach Learn
SSID ssj0000816898
Score 2.1247952
Snippet Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task....
SourceID unpaywall
proquest
pubmed
crossref
ieee
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 1414
SubjectTerms Approximation algorithms
Approximation methods
Entropy
History
Learning (artificial intelligence)
Learning abstractions
Mathematical model
partially observable Markov decision process (POMDP)
reinforcement learning (RL)
Vectors
SummonAdditionalLinks – databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB7R5UB7KKX0EaDIrTi0RdlNvI69Pm5XRYgDRWhXglNkO04PXSWoJKq2v56x413xEhI3JxpHdmZG_mzPfANwUCLEVzLN0NOYjBktbSyF4vFwaExZupQE5dk-T_nxjJ1cZBdr8GWZC3OPXwDxSpoOGO4aRi9gnWeIt3uwPjs9G1-6qnEpRzVTXyY2tEUWri7TRA4as9Aueov1KeKMxOWg3Fp8fDWVx4DlK9hoqyu1-Kfm81uLzdEmTJbD7GJM_vTbRvfN_3sMjk_P4w28DliTjDvj2II1W72FreDN1-RroJz-tg3l1EfPkvAm5GaSuiTjtqkR1NqCTDsSK_yedscjPh-CNDU5c8aHc1iQX9of8eq5JefWM7Iaf_hIAonr73cwO_o5nRzHoQJDbLKENbFGj0fEUFBmEKkok1lrBDOp4CXT7moAm1bRgiuFK_3IoReDf35oOS9QPBm-h15VV_YjEO2QpJCqYAljVCTa0frg9lNZRDyq4BEkS9XkJtCTuyoZ89xvUxKZTyeXP3KnzTxoM4Lvqy5XHTfHU8LbTt8rQT6SLBMigs9L_efoWO62RFW2bq_zlEtBqavJE8GHzjBWnSmCHs4ki-BwZSkPhuCs784Qdp4lvQsv8THrgg33oNf8be0nBECN3g8ecAOs0_vY
  priority: 102
  providerName: Unpaywall
Title Toward Generalization of Automated Temporal Abstraction to Partially Observable Reinforcement Learning
URI https://ieeexplore.ieee.org/document/6894577
https://www.ncbi.nlm.nih.gov/pubmed/25216494
https://www.proquest.com/docview/1697220688
https://hdl.handle.net/11511/46018
UnpaywallVersion submittedVersion
Volume 45
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 2168-2275
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000816898
  issn: 2168-2267
  databaseCode: RIE
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB615QAcgFIe4VEZiQOvbLOOY6-PS0VVIVEqtCu1p8h2JhxYJRVNhJZfz9jxRhQqxM2K7MjRzMSf5_ENwMuaIL7R04IsTehU8BpTrYxM89y5uvYlCSawfZ7I46X4eFacbcG7sRYGEUPyGU78MMTyq9b13lV2IGdaFEptw7aayaFWa_SnhAYSofUtp0FKqELFIOY00weLw_P3Po9LTDghjiz3bfo4nVxSaHHlRAotVq5Dm7fhZt9cmPUPs1r9dgId3YVPm70PiSffJn1nJ-7nH7SO__tx9-BOhKJsPujOLmxhcx92o7FfsleRkfr1HtSLkFzL4pNYusnams37riXMixVbDBxX9D7rvSehXIJ1LTv1uklfs2afbfAA2xWyLxgIW13wTbLI8fr1ASyPPiwOj9PYoCF1RSa61NIPgQBFxYUjIGNcgeiUcFMla2F95ICGaHgljSEgMPPgxpE4cpSyoulZ_hB2mrbBx8CsB5pKm0pkQnCVWc_6Q7dTgwSITCUTyDZCKl1kL_dNNFZluMVkuvQiLr2IyyjiBN6MSy4G6o5_Td7zIhknRmkk8GKjCSXZnQ-mmAbb_rKcSq049y17Eng0qMi4eKNZCbwddeavLXRuba9s4cn1W3gKt2hWMSQdPoOd7nuPzwkIdXY_WMA-3FienM7PfwEITwFQ
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3fT9UwFD5BfEAfQAR1olATHxTYZbfrWvqIRHJRQGNGAk9L23U-eLMR2GKuf72nXe8iSohvzdIuXc4569fz4zsAbyuE-EqOM7Q0JmNGKxtLoXicpsZUlStJUJ7t84xPztmni-xiAXaHWhhrrU8-syM39LH8sjGdc5Xt8X3JMiEewMOMMZb11VqDR8W3kPDNbykOYsQVIoQxx4ncyw8vP7hMLjaiiDmS1DXqo3h2cSbZrTPJN1m5C28-hqWuvlKzn2o6_eMMOlqB0_nu-9STH6Ou1SPz6y9ix__9vCewHMAoOei1ZxUWbP0UVoO535B3gZP6_RpUuU-vJeFJKN4kTUUOurZB1GtLkvcsV_g-7fwnvmCCtA356rQTv2ZGvmjvA9ZTS75ZT9lqvHeSBJbX7-twfvQxP5zEoUVDbLKEtbHGXwJCipIyg1BGmcxaI5gZC14x7WIHOLSKllwphAL7Dt4YFEdqOS9xepI-g8W6qe0LINpBTSFVyRLGqEi04_3B-6myCIlUySNI5kIqTOAvd200poW_xySycCIunIiLIOIItoclVz15x32T15xIholBGhG8mWtCgZbnwimqtk13U4y5FJS6pj0RPO9VZFg816wIdgad-WcLrZnpW1t4efcWtmBpkp-eFCfHZ5834BGuyPoUxFew2F539jXColZvemv4DYvrAu0
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB7R5UB7KKX0EaDIrTi0RdlNvI69Pm5XRYgDRWhXglNkO04PXSWoJKq2v56x413xEhI3JxpHdmZG_mzPfANwUCLEVzLN0NOYjBktbSyF4vFwaExZupQE5dk-T_nxjJ1cZBdr8GWZC3OPXwDxSpoOGO4aRi9gnWeIt3uwPjs9G1-6qnEpRzVTXyY2tEUWri7TRA4as9Aueov1KeKMxOWg3Fp8fDWVx4DlK9hoqyu1-Kfm81uLzdEmTJbD7GJM_vTbRvfN_3sMjk_P4w28DliTjDvj2II1W72FreDN1-RroJz-tg3l1EfPkvAm5GaSuiTjtqkR1NqCTDsSK_yedscjPh-CNDU5c8aHc1iQX9of8eq5JefWM7Iaf_hIAonr73cwO_o5nRzHoQJDbLKENbFGj0fEUFBmEKkok1lrBDOp4CXT7moAm1bRgiuFK_3IoReDf35oOS9QPBm-h15VV_YjEO2QpJCqYAljVCTa0frg9lNZRDyq4BEkS9XkJtCTuyoZ89xvUxKZTyeXP3KnzTxoM4Lvqy5XHTfHU8LbTt8rQT6SLBMigs9L_efoWO62RFW2bq_zlEtBqavJE8GHzjBWnSmCHs4ki-BwZSkPhuCs784Qdp4lvQsv8THrgg33oNf8be0nBECN3g8ecAOs0_vY
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Toward+Generalization+of+Automated+Temporal+Abstraction+to+Partially+Observable+Reinforcement+Learning&rft.jtitle=IEEE+transactions+on+cybernetics&rft.au=Cilden%2C+Erkin&rft.au=Polat%2C+Faruk&rft.date=2015-08-01&rft.issn=2168-2267&rft.eissn=2168-2275&rft.volume=45&rft.issue=8&rft.spage=1414&rft.epage=1425&rft_id=info:doi/10.1109%2FTCYB.2014.2352038&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TCYB_2014_2352038
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2168-2267&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2168-2267&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2168-2267&client=summon