A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network

This paper motivates the use of combination of mel frequency cepstral coefficients (MFCC) and its delta derivatives (DMFCC and DDMFCC) calculated using mel spaced Gaussian filter banks for text independent speaker recognition. MFCC modeled on the human auditory system shows robustness against noise...

Full description

Saved in:
Bibliographic Details
Published in2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR) pp. 1 - 6
Main Authors Ahmad, Khan Suhail, Thosar, Anil S., Nirmal, Jagannath H., Pande, Vinay S.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.01.2015
Subjects
Online AccessGet full text
DOI10.1109/ICAPR.2015.7050669

Cover

More Information
Summary:This paper motivates the use of combination of mel frequency cepstral coefficients (MFCC) and its delta derivatives (DMFCC and DDMFCC) calculated using mel spaced Gaussian filter banks for text independent speaker recognition. MFCC modeled on the human auditory system shows robustness against noise and session changes and hence has become synonymous with speaker recognition. Our main aim is to test the accuracy of our proposed feature set for different values of frame overlap and MFCC feature vector sizes to identify the system having highest accuracy. Principal component analysis (PCA) is applied before the training and testing stages for feature dimensionality reduction thereby increasing computing speed and puts low constraint on the memory required for processing. The use of probabilistic neural network (PNN) in the modeling domain provided the advantages of achieving lower operational times during the training stages. The experiments examined the percentage identification accuracy (PIA) of MFCC, combination of MFCC and DMFCC as well as combination of all three feature sets MFCC, DMFCC and DDMFCC. The proposed feature set attains an identification accuracy of 94% for frame overlap of 90% and MFCC feature size of 18 coefficients. It outperforms the identification rates of the other two feature sets. These speaker recognition experiments were tested using the Voxforge database.
DOI:10.1109/ICAPR.2015.7050669