Control of Ski Robot Based on Deep Reinforcement Learning
This paper describes a humanoid robot developed for the 2020 Beijing Ski Robot Challenge. The goal is to design a skiing robot that can independently perform skiing movements to reach a designated destination. Aiming at the biped alpine skiing robot, we proposed a skiing control algorithm based on D...
Saved in:
| Published in | 2021 International Conference on Security, Pattern Analysis, and Cybernetics(SPAC pp. 211 - 215 |
|---|---|
| Main Authors | , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
18.06.2021
|
| Subjects | |
| Online Access | Get full text |
| DOI | 10.1109/SPAC53836.2021.9539926 |
Cover
| Summary: | This paper describes a humanoid robot developed for the 2020 Beijing Ski Robot Challenge. The goal is to design a skiing robot that can independently perform skiing movements to reach a designated destination. Aiming at the biped alpine skiing robot, we proposed a skiing control algorithm based on DDPG reinforcement learning. In this paper, the approximate method is used to establish the relationship between tilting angle, skateboard cutting angle and turning radius of the robot. In order to simplify the dimension of the output of the control algorithm, the relationship among turning radius and foot length and toe distance is established. We established the relationship among the turning radius, the length of the feet and the distance between the toes, and also controlled the turning radius by controlling the length of the feet and the distance between the feet, moreover, we obtained the angles of each joint of the humanoid robot by establishing the kinematics model of the humanoid robot. The control algorithm uses critic network to evaluate the state-action value, and uses actor network to generate the parameters of foot length and foot spacing in real time. In the process of DDPG network training, the concept of zero torque point (ZMP) of the robot is introduced, and the ZMP point of the robot is used to judge whether the robot falls or not, and the termination time of the sequence is determined, and the training is carried out on the gym simulation platform. |
|---|---|
| DOI: | 10.1109/SPAC53836.2021.9539926 |