One Voice is All You Need: A One-Shot Approach to Recognize Your Voice

In the field of computer vision, one-shot learning has proven to be effective, as it works accurately with a single labeled training example and a small number of training sets. In one-shot learning, we must accurately make predictions based on only one sample of each new class. In this paper, we lo...

Full description

Saved in:

Bibliographic Details
Published in	2022 7th International Conference on Data Science and Machine Learning Applications (CDMA) pp. 103 - 108
Main Authors	Nowshin, Priata, Dipto, Shahriar Rumi, Ahmed, Intesur, Chowdhury, Deboraj, Noor, Galib Abdun, Chakrabarty, Amitabha, Abdullah, Muhammad Tahmeed, Rahman, Moshiur
Format	Conference Proceeding
Language	English
Published	IEEE 01.03.2022
Subjects	audio classification Feature extraction Machine learning One-shot learning Prototypes Security Siamese neural network Speaker recognition Speech recognition Training triplet loss
Online Access	Get full text
DOI	10.1109/CDMA54072.2022.00022

Cover

More Information
Summary:	In the field of computer vision, one-shot learning has proven to be effective, as it works accurately with a single labeled training example and a small number of training sets. In one-shot learning, we must accurately make predictions based on only one sample of each new class. In this paper, we look at a strategy for learning Siamese neural networks that use a distinctive structure to automatically evaluate the similarity between inputs. The goal of this paper is to apply the concept of one-shot learning to audio classification by extracting specific features, where it uses triplet loss to train the model to learn through Siamese network and calculates the rate of similarity while testing via a support set and a query set. We have executed our experiment on LibriSpeech ASR corpus. We evaluated our work on N-way-1-shot learning and generated strong results for 2-way (100%), 3-way (95%), 4-way (84%), and 5-way (74%) that outperform existing machine learning models by a large margin. To the best of our knowledge, this may be the first paper to look at the possibility of one-shot human speech recognition on the LibriSpeech ASR corpus using the Siamese network.
DOI:	10.1109/CDMA54072.2022.00022