Trapezoidal Gradient Descent for Effective Reinforcement Learning in Spiking Networks

With the rapid development of artificial intelligence technology, the field of reinforcement learning has continuously achieved breakthroughs in both theory and practice. However, traditional reinforcement learning algorithms often entail high energy consumption during interactions with the environm...

Full description

Saved in:

Bibliographic Details
Main Authors	Pan, Yuhao, Wang, Xiucheng, Cheng, Nan, Qiu, Qi
Format	Journal Article
Language	English
Published	19.06.2024
Subjects	Computer Science - Artificial Intelligence
Online Access	Get full text
DOI	10.48550/arxiv.2406.13568

Cover

More Information
Summary:	With the rapid development of artificial intelligence technology, the field of reinforcement learning has continuously achieved breakthroughs in both theory and practice. However, traditional reinforcement learning algorithms often entail high energy consumption during interactions with the environment. Spiking Neural Network (SNN), with their low energy consumption characteristics and performance comparable to deep neural networks, have garnered widespread attention. To reduce the energy consumption of practical applications of reinforcement learning, researchers have successively proposed the Pop-SAN and MDC-SAN algorithms. Nonetheless, these algorithms use rectangular functions to approximate the spike network during the training process, resulting in low sensitivity, thus indicating room for improvement in the training effectiveness of SNN. Based on this, we propose a trapezoidal approximation gradient method to replace the spike network, which not only preserves the original stable learning state but also enhances the model's adaptability and response sensitivity under various signal dynamics. Simulation results show that the improved algorithm, using the trapezoidal approximation gradient to replace the spike network, achieves better convergence speed and performance compared to the original algorithm and demonstrates good training stability.
DOI:	10.48550/arxiv.2406.13568