Deep Learning Based Speaker Recognition System with CNN and LSTM Techniques

Speaker recognition is an advanced method to identify a person from the biometric characteristics of speaking voice samples. Speaker recognition has become a vastly popular and useful research subject with countless essential applications in security, assistance, replication, authentication, automat...

Full description

Saved in:

Bibliographic Details
Published in	2022 Interdisciplinary Research in Technology and Management (IRTM) pp. 1 - 6
Main Authors	Prachi, Noshin Nirvana, Nahiyan, Faisal Mahmud, Habibullah, Md, Khan, Riasat
Format	Conference Proceeding
Language	English
Published	IEEE 24.02.2022
Subjects	convolutional neural network Convolutional neural networks Deep learning long short-term memory mel-frequency cepstral coefficient speaker identification Speaker recognition Speech recognition Time series analysis Training Transforms
Online Access	Get full text
DOI	10.1109/IRTM54583.2022.9791766

Cover

More Information
Summary:	Speaker recognition is an advanced method to identify a person from the biometric characteristics of speaking voice samples. Speaker recognition has become a vastly popular and useful research subject with countless essential applications in security, assistance, replication, authentication, automation, and verification. Many techniques are implemented using deep learning and neural network concepts and various datasets for speaker verification and identification. The primary goal of this work is to create improved robust techniques of speaker recognition to identify audio and enhance accuracy to human levels of comprehension. TIMIT and LibriSpeech datasets are used in this paper to develop an efficient automatic speaker recognition system. This work focuses on using MFCC to transform audio to spectrograms without losing the essential features of the audio file in question. We have used a closed set and an open set implementation procedure on these datasets. The closed set implementation uses a standard machine learning convention of utilizing the same datasets for training and testing, leading to higher accuracy. On the other hand, the open set implementation uses one dataset to train and another to test on each occasion. The accuracy, in this case, turned out to be relatively lower. On each dataset, CNN and LSTM deep learning techniques have been used to identify the sound, leading to the observation that implementing CNN resulted in a more significant accuracy.
DOI:	10.1109/IRTM54583.2022.9791766