An Adaptive Q-Learning Algorithm Developed for Agent-Based Computational Modeling of Electricity Market

Balancing between exploration and exploitation with adaptation of the Q -learning (QL) parameters to the condition of dynamic uncertain environment has always been a significant subject of interest in the context of reinforcement learning. The peculiarities of the electricity market have provided su...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on systems, man and cybernetics. Part C, Applications and reviews Vol. 40; no. 5; pp. 547 - 556
Main Authors Rahimiyan, Morteza, Mashhadi, Habib Rajabi
Format Journal Article
LanguageEnglish
Published New-York, NY IEEE 01.09.2010
Institute of Electrical and Electronics Engineers
Subjects
Online AccessGet full text
ISSN1094-6977
1558-2442
DOI10.1109/TSMCC.2010.2044174

Cover

More Information
Summary:Balancing between exploration and exploitation with adaptation of the Q -learning (QL) parameters to the condition of dynamic uncertain environment has always been a significant subject of interest in the context of reinforcement learning. The peculiarities of the electricity market have provided such complex dynamic economic environment, and consequently have increased the requirement for advancement of the learning methods. In this economic system, the agent's market power plays a vital role in bidding decision-making problem. In order to improve the QL method, as main idea, adaptation of its parameters to the market power is proposed for making a good balance between exploration and exploitation. To implement this adaptation process, due to the fuzzy nature of human's decision-making process, a fuzzy system is designed to map each agent's market power into the QL parameters. Therefore, a fuzzy QL method is developed to model the power supplier's strategic bidding behavior in a computational electricity market. In the simulation framework, the QL algorithm selects the power supplier's bidding strategy according to the past experiences and the values of the parameters, which show the human's risk characteristic. The application of the proposed methodology for the power supplier in a multiarea power system shows the performance improvement in comparison to the QL with fixed parameters.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:1094-6977
1558-2442
DOI:10.1109/TSMCC.2010.2044174