A Primer on Seq2Seq Models for Generative Chatbots

The recent spread of Deep Learning-based solutions for Artificial Intelligence and the development of Large Language Models has pushed forwards significantly the Natural Language Processing area. The approach has quickly evolved in the last ten years, deeply affecting NLP, from low-level text pre-pr...

Full description

Saved in:

Bibliographic Details
Published in	ACM computing surveys Vol. 56; no. 3; pp. 1 - 58
Main Authors	Scotti, Vincenzo, Sbattella, Licia, Tedesco, Roberto
Format	Journal Article
Language	English
Published	New York, NY ACM 31.03.2024 Association for Computing Machinery
Subjects	Artificial intelligence Chatbots Computer science Computing methodologies Deep learning Discourse, dialogue and pragmatics Large language models Machine learning Machine translation Natural language generation Natural language processing Neural Networks Supervised learning Task complexity generative chatbot Seq2Seq Language Model Natural Language Processing Open-domain dialogue
Online Access	Get full text
ISSN	0360-0300 1557-7341 1557-7341
DOI	10.1145/3604281

Cover

More Information
Summary:	The recent spread of Deep Learning-based solutions for Artificial Intelligence and the development of Large Language Models has pushed forwards significantly the Natural Language Processing area. The approach has quickly evolved in the last ten years, deeply affecting NLP, from low-level text pre-processing tasks –such as tokenisation or POS tagging– to high-level, complex NLP applications like machine translation and chatbots. This article examines recent trends in the development of open-domain data-driven generative chatbots, focusing on the Seq2Seq architectures. Such architectures are compatible with multiple learning approaches, ranging from supervised to reinforcement and, in the last years, allowed to realise very engaging open-domain chatbots. Not only do these architectures allow to directly output the next turn in a conversation but, to some extent, they also allow to control the style or content of the response. To offer a complete view on the subject, we examine possible architecture implementations as well as training and evaluation approaches. Additionally, we provide information about the openly available corpora to train and evaluate such models and about the current and past chatbot competitions. Finally, we present some insights on possible future directions, given the current research status.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0360-0300 1557-7341 1557-7341
DOI:	10.1145/3604281