An Adaptive Q-Learning Algorithm Developed for Agent-Based Computational Modeling of Electricity Market
Balancing between exploration and exploitation with adaptation of the Q -learning (QL) parameters to the condition of dynamic uncertain environment has always been a significant subject of interest in the context of reinforcement learning. The peculiarities of the electricity market have provided su...
        Saved in:
      
    
          | Published in | IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews Vol. 40; no. 5; pp. 547 - 556 | 
|---|---|
| Main Authors | , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        New-York, NY
          IEEE
    
        01.09.2010
     Institute of Electrical and Electronics Engineers  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1094-6977 1558-2442  | 
| DOI | 10.1109/TSMCC.2010.2044174 | 
Cover
| Summary: | Balancing between exploration and exploitation with adaptation of the Q -learning (QL) parameters to the condition of dynamic uncertain environment has always been a significant subject of interest in the context of reinforcement learning. The peculiarities of the electricity market have provided such complex dynamic economic environment, and consequently have increased the requirement for advancement of the learning methods. In this economic system, the agent's market power plays a vital role in bidding decision-making problem. In order to improve the QL method, as main idea, adaptation of its parameters to the market power is proposed for making a good balance between exploration and exploitation. To implement this adaptation process, due to the fuzzy nature of human's decision-making process, a fuzzy system is designed to map each agent's market power into the QL parameters. Therefore, a fuzzy QL method is developed to model the power supplier's strategic bidding behavior in a computational electricity market. In the simulation framework, the QL algorithm selects the power supplier's bidding strategy according to the past experiences and the values of the parameters, which show the human's risk characteristic. The application of the proposed methodology for the power supplier in a multiarea power system shows the performance improvement in comparison to the QL with fixed parameters. | 
|---|---|
| Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23  | 
| ISSN: | 1094-6977 1558-2442  | 
| DOI: | 10.1109/TSMCC.2010.2044174 |