Modeling social interaction and intention for pedestrian trajectory prediction

Future pedestrian trajectory prediction offers great prospects for many practical applications. Most existing methods focus on social interaction among pedestrians but ignore the fact that in addition to pedestrians there are other kinds of objects (cars, dogs, bicycles, motorcycles, etc.) with a gr...

Full description

Saved in:

Bibliographic Details
Published in	Physica A Vol. 570; p. 125790
Main Authors	Chen, Kai, Song, Xiao, Ren, Xiaoxiang
Format	Journal Article
Language	English
Published	Elsevier B.V 15.05.2021
Subjects	Attention Convolutional long–short-term memory Encoder–decoder Pedestrian intention Social-interaction Encoder–decoder Pedestrian intention Convolutional long–short-term memory Social-interaction Attention
Online Access	Get full text
ISSN	0378-4371 1873-2119
DOI	10.1016/j.physa.2021.125790

Cover

More Information
Summary:	Future pedestrian trajectory prediction offers great prospects for many practical applications. Most existing methods focus on social interaction among pedestrians but ignore the fact that in addition to pedestrians there are other kinds of objects (cars, dogs, bicycles, motorcycles, etc.) with a great influence on the subject pedestrian’s future trajectory. Most existing methods neglect the intentions of the pedestrian, which can be obtained by the key points of the subject pedestrian’s face. Therefore, rich category information about the subject pedestrian’s surroundings and face key points plays a great role in promoting the modeling of pedestrian movement. Motivated by this idea, this paper tries to predict a pedestrian’s future trajectory by jointly using various categories and the relative positions of the subject pedestrian’s surroundings and the key points in his face. We propose a data modeling method to effectively unify rich visual features about categories, interaction and face key points into a multi-channel tensor and build an end-to-end fully convolutional encoder–decoder attention model based on convolutional long–short-term memory utilizing this tensor. We evaluate and compare our method with several existing methods on 5 crowded video sequences from the public dataset multi-object tracking (MOT) -16. Experimental results show that our method outperforms state-of-the-art approaches, with less prediction error. •Pedestrian’s future trajectory is predicted by rich categories of his surroundings.•An end-to-end fully convolutional encoder-decoder attention model is proposed.•Experimental results show our method generates less prediction error.
ISSN:	0378-4371 1873-2119
DOI:	10.1016/j.physa.2021.125790