Control of Ski Robot Based on Deep Reinforcement Learning

This paper describes a humanoid robot developed for the 2020 Beijing Ski Robot Challenge. The goal is to design a skiing robot that can independently perform skiing movements to reach a designated destination. Aiming at the biped alpine skiing robot, we proposed a skiing control algorithm based on D...

Full description

Saved in:

Bibliographic Details
Published in	2021 International Conference on Security, Pattern Analysis, and Cybernetics（SPAC pp. 211 - 215
Main Authors	Wu, Zegui, Ye, Junting, Wang, Xinran, Li, Fusheng
Format	Conference Proceeding
Language	English
Published	IEEE 18.06.2021
Subjects	Analytical models DDPG Heuristic algorithms Humanoid robots Kinematics Reinforcement learning skiing robot Torque Training ZMP
Online Access	Get full text
DOI	10.1109/SPAC53836.2021.9539926

Cover

More Information
Summary:	This paper describes a humanoid robot developed for the 2020 Beijing Ski Robot Challenge. The goal is to design a skiing robot that can independently perform skiing movements to reach a designated destination. Aiming at the biped alpine skiing robot, we proposed a skiing control algorithm based on DDPG reinforcement learning. In this paper, the approximate method is used to establish the relationship between tilting angle, skateboard cutting angle and turning radius of the robot. In order to simplify the dimension of the output of the control algorithm, the relationship among turning radius and foot length and toe distance is established. We established the relationship among the turning radius, the length of the feet and the distance between the toes, and also controlled the turning radius by controlling the length of the feet and the distance between the feet, moreover, we obtained the angles of each joint of the humanoid robot by establishing the kinematics model of the humanoid robot. The control algorithm uses critic network to evaluate the state-action value, and uses actor network to generate the parameters of foot length and foot spacing in real time. In the process of DDPG network training, the concept of zero torque point (ZMP) of the robot is introduced, and the ZMP point of the robot is used to judge whether the robot falls or not, and the termination time of the sequence is determined, and the training is carried out on the gym simulation platform.
DOI:	10.1109/SPAC53836.2021.9539926