Position Control and Production of Various Strategies for Game of Go Using Deep Learning Methods

Computer Go programs have surpassed top-level human players by using deep learning and reinforcement learning techniques. Other than the strength, entertaining Go AI and AI coaches are also interesting directions but have not been well investigated. Some researchers have worked on entertaining begin...

Full description

Saved in:
Bibliographic Details
Published inJournal of Information Science and Engineering Vol. 37; no. 3; pp. 553 - 573
Main Authors Shi, Yuan, Fan, Tianwen, Li, Wanxiang, Hsueh, Chu-Hsuan, Ikeda, Kokolo
Format Journal Article
LanguageEnglish
Published Taipei 社團法人中華民國計算語言學學會 01.05.2021
Institute of Information Science, Academia Sinica
Subjects
Online AccessGet full text
ISSN1016-2364
DOI10.6688/JISE.202105_37(3).0004

Cover

More Information
Summary:Computer Go programs have surpassed top-level human players by using deep learning and reinforcement learning techniques. Other than the strength, entertaining Go AI and AI coaches are also interesting directions but have not been well investigated. Some researchers have worked on entertaining beginners or intermediate players. One topic is position control, aiming to make strong programs play close games against weak players. Under such a scenario, the naturalness of the moves is likely to influence weaker players' enjoyment. Another topic is producing various strategies (or preferences), which human players usually have. Some methods for the two topics have been proposed and evaluated for a traditional Monte-Carlo tree search (MCTS) program. However, there are some critical differences between traditional MCTS programs and recent programs based on AlphaGo Zero, such as LeelaZero and KataGo. For example, recent programs do not run random simulations to the ends of games in MCTS, making the existing method for producing various strategies not applicable. In this paper, we first summarize such differences and some resulted problems. We then adapt existing methods as well as propose new methods to solve the problems, where promising results are obtained. For position control, the modified LeelaZero can play gently against a weaker player (48% of wins against a weaker program, Ray). A human subject experiment shows that the average number of unnatural moves per game is 1.22, while that by a simple method without considering naturalness is 2.29. We also propose a new position control method specifically for endgames. Finally, for producing various strategies, two methods are introduced. In our experiments, center- and edge/corner-oriented strategies are produced by both methods, and human players can successfully identify the strategies.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1016-2364
DOI:10.6688/JISE.202105_37(3).0004