Robotic Object Sorting via Deep Reinforcement Learning: a generalized approach

This work proposes a general formulation for the Object Sorting problem, suitable to describe any non-deterministic environment characterized by friendly and adversarial interference. Such an approach, coupled with a Deep Reinforcement Learning algorithm, allows training policies to solve different...

Full description

Saved in:

Bibliographic Details
Published in	IEEE RO-MAN pp. 1266 - 1273
Main Authors	Nicola, Giorgio, Tagliapietra, Luca, Tosello, Elisa, Navarin, Nicolo, Ghidoni, Stefano, Menegatti, Emanuele
Format	Conference Proceeding
Language	English
Published	IEEE 01.08.2020
Subjects	Computer architecture Learning systems Markov processes Microprocessors Reinforcement learning Three-dimensional displays Training
Online Access	Get full text
ISSN	1944-9437
DOI	10.1109/RO-MAN47096.2020.9223484

Cover

More Information
Summary:	This work proposes a general formulation for the Object Sorting problem, suitable to describe any non-deterministic environment characterized by friendly and adversarial interference. Such an approach, coupled with a Deep Reinforcement Learning algorithm, allows training policies to solve different sorting tasks without adjusting the architecture or modifying the learning method. Briefly, the environment is subdivided into a clutter, where objects are freely located, and a set of clusters, where objects should be placed according to predefined ordering and classification rules. A 3D grid discretizes such environment: the properties of an object within a cell depict its state. Such attributes include object category and order. A Markov Decision Process formulates the problem: at each time step, the state of the cells fully defines the environment's one. Users can custom-define object classes, ordering priorities, and failure rules. The latter by assigning a non-uniform risk probability to each cell. Performed experiments successfully trained and validated a Deep Reinforcement Learning model to solve several sorting tasks while minimizing the number of moves and failure probability. Obtained results demonstrate the capability of the system to handle non-deterministic events, like failures, and unpredictable external disturbances, like human user interventions.
ISSN:	1944-9437
DOI:	10.1109/RO-MAN47096.2020.9223484