One Voice is All You Need: A One-Shot Approach to Recognize Your Voice
In the field of computer vision, one-shot learning has proven to be effective, as it works accurately with a single labeled training example and a small number of training sets. In one-shot learning, we must accurately make predictions based on only one sample of each new class. In this paper, we lo...
Saved in:
Published in | 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA) pp. 103 - 108 |
---|---|
Main Authors | , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.03.2022
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/CDMA54072.2022.00022 |
Cover
Summary: | In the field of computer vision, one-shot learning has proven to be effective, as it works accurately with a single labeled training example and a small number of training sets. In one-shot learning, we must accurately make predictions based on only one sample of each new class. In this paper, we look at a strategy for learning Siamese neural networks that use a distinctive structure to automatically evaluate the similarity between inputs. The goal of this paper is to apply the concept of one-shot learning to audio classification by extracting specific features, where it uses triplet loss to train the model to learn through Siamese network and calculates the rate of similarity while testing via a support set and a query set. We have executed our experiment on LibriSpeech ASR corpus. We evaluated our work on N-way-1-shot learning and generated strong results for 2-way (100%), 3-way (95%), 4-way (84%), and 5-way (74%) that outperform existing machine learning models by a large margin. To the best of our knowledge, this may be the first paper to look at the possibility of one-shot human speech recognition on the LibriSpeech ASR corpus using the Siamese network. |
---|---|
DOI: | 10.1109/CDMA54072.2022.00022 |