Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition

This paper proposes a feature extraction method that represents both the periodicity and aperiodicity of speech for robust speech recognition. The development of this feature extraction method was motivated by findings in speech perception research. With this method, the speech sound is filtered by...

Full description

Saved in:
Bibliographic Details
Published in2004 IEEE International Conference on Acoustics, Speech and Signal Processing Vol. 1; pp. I - 141
Main Authors Ishizuka, K., Miyazaki, N.
Format Conference Proceeding
LanguageEnglish
Japanese
Published Piscataway, N.J IEEE 28.09.2004
Subjects
Online AccessGet full text
ISBN9780780384842
0780384849
ISSN1520-6149
DOI10.1109/ICASSP.2004.1325942

Cover

Abstract This paper proposes a feature extraction method that represents both the periodicity and aperiodicity of speech for robust speech recognition. The development of this feature extraction method was motivated by findings in speech perception research. With this method, the speech sound is filtered by Gammatone filter banks, and then the output of each filter is comb filtered. Individual comb filters designed for each output signal of the Gammatone filter are used to divide the output of each filter into its periodic and aperiodic features in the sub band. The power suppressed by comb filtering is considered to be a periodic feature, whereas the power of the residue after comb filtering is considered to be an aperiodic feature. This method uses both features as the feature parameters for automatic speech recognition. A preliminary experiment using a five vowel recognition task designed to compare the proposed approach with the conventional MFCC-based feature extraction method shows that the proposed method improves vowel recognition rates by as much as 14.7 % in the presence of pink noise or a harmonic complex tone interferer. An evaluation experiment undertaken using the Aurora-2J database (Japanese noisy digit recognition database) to compare the proposed approach with the MFCC-based conventional (baseline) feature extraction method shows that the proposed method reduces the word error rate by as much as 59.62 %, with an average value of 18.21 %.
AbstractList This paper proposes a feature extraction method that represents both the periodicity and aperiodicity of speech for robust speech recognition. The development of this feature extraction method was motivated by findings in speech perception research. With this method, the speech sound is filtered by Gammatone filter banks, and then the output of each filter is comb filtered. Individual comb filters designed for each output signal of the Gammatone filter are used to divide the output of each filter into its periodic and aperiodic features in the sub band. The power suppressed by comb filtering is considered to be a periodic feature, whereas the power of the residue after comb filtering is considered to be an aperiodic feature. This method uses both features as the feature parameters for automatic speech recognition. A preliminary experiment using a five vowel recognition task designed to compare the proposed approach with the conventional MFCC-based feature extraction method shows that the proposed method improves vowel recognition rates by as much as 14.7 % in the presence of pink noise or a harmonic complex tone interferer. An evaluation experiment undertaken using the Aurora-2J database (Japanese noisy digit recognition database) to compare the proposed approach with the MFCC-based conventional (baseline) feature extraction method shows that the proposed method reduces the word error rate by as much as 59.62 %, with an average value of 18.21 %.
Author Ishizuka, K.
Miyazaki, N.
Author_xml – sequence: 1
  givenname: K.
  surname: Ishizuka
  fullname: Ishizuka, K.
  organization: NTT Commun. Sci. Labs., NTT Corp., Tokyo, Japan
– sequence: 2
  givenname: N.
  surname: Miyazaki
  fullname: Miyazaki, N.
  organization: NTT Commun. Sci. Labs., NTT Corp., Tokyo, Japan
BackLink http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=17565876$$DView record in Pascal Francis
BookMark eNpNkFtLAzEQhQNWsK39BX3Ji49bc9tcHqWoFQoK7XvJZmfbSJssyS7Yf-_KCgoDB-Z8nGHODE1CDIDQkpIVpcQ8vq2fdruPFSNErChnpRHsBi2M0mQYroUWbIKmtGSkkFSYOzTL-ZMQopXQU9TvWgB3wg3Yrk-A4atL1nU-BnyB7hRrnKBNkCF0PhxxC8nH2jvfXbENNbb_Fz7g3Fe4GoyMm5hwilWfO5zHEwlcPAb_k32Pbht7zrD41Tnavzzv15ti-_46_LMtPBWlLBSljTRaMVZVrnKcE61BSVqCZhK4tYbVppFcU11LpZx0NWNMCS6VFdTyOXoYY1ubnT03yQbn86FN_mLT9UBVKUut5MAtR84DwJ89lsm_AWqCbFI
ContentType Conference Proceeding
Copyright 2006 INIST-CNRS
Copyright_xml – notice: 2006 INIST-CNRS
DBID 6IE
6IH
CBEJK
RIE
RIO
IQODW
DOI 10.1109/ICASSP.2004.1325942
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
Pascal-Francis
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Applied Sciences
EndPage 141
ExternalDocumentID 17565876
1325942
Genre orig-research
GroupedDBID 23M
29P
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
AAVQY
IQODW
RIB
RIC
ID FETCH-LOGICAL-i1456-711f698722bbcbc33088e7615e826e3aa92d9f63818d677c6cd22274367a41a3
IEDL.DBID RIE
ISBN 9780780384842
0780384849
ISSN 1520-6149
IngestDate Wed Apr 02 07:25:29 EDT 2025
Tue Aug 26 18:33:07 EDT 2025
IsPeerReviewed false
IsScholarly true
Keywords Harmonic
Speech analysis
Filtering
Acoustic signal
Error rate
Comb filters
Verbal perception
Output signal
Japanese
Cepstral analysis
Audio signal
Vowel
Speech recognition
Database
Signal processing
Feature extraction
Filter bank
Automatic recognition
Speech processing
Language English
Japanese
License CC BY 4.0
LinkModel DirectLink
MeetingName 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing (proceedings)
MergedId FETCHMERGED-LOGICAL-i1456-711f698722bbcbc33088e7615e826e3aa92d9f63818d677c6cd22274367a41a3
ParticipantIDs ieee_primary_1325942
pascalfrancis_primary_17565876
PublicationCentury 2000
PublicationDate 2004-09-28
PublicationDateYYYYMMDD 2004-09-28
PublicationDate_xml – month: 09
  year: 2004
  text: 2004-09-28
  day: 28
PublicationDecade 2000
PublicationPlace Piscataway, N.J
PublicationPlace_xml – name: Piscataway, N.J
PublicationTitle 2004 IEEE International Conference on Acoustics, Speech and Signal Processing
PublicationTitleAbbrev ICASSP
PublicationYear 2004
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0008748
ssj0000454154
Score 1.55485
Snippet This paper proposes a feature extraction method that represents both the periodicity and aperiodicity of speech for robust speech recognition. The development...
SourceID pascalfrancis
ieee
SourceType Index Database
Publisher
StartPage I
SubjectTerms 1f noise
Applied sciences
Automatic speech recognition
Detection, estimation, filtering, equalization, prediction
Exact sciences and technology
Feature extraction
Filter bank
Filtering
Information, signal and communications theory
Power harmonic filters
Robustness
Signal and communications theory
Signal design
Signal processing
Signal, noise
Spatial databases
Speech processing
Speech recognition
Telecommunications and information theory
Title Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition
URI https://ieeexplore.ieee.org/document/1325942
Volume 1
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LT8IwGG-Qk158gBEfpAePbrC1a7ejIRI0wZCACTfSdl0kJhth20H_er92Y6Dx4G3vpo99j37f7_chdA-_j4aVQBzJpAkzEuFIPxCOIAH1wiRIpGXbn76yyRt9WQbLFnposDBaa5t8pl1zaGP5caZKs1U2AM8piCgI3CMesgqr1eynGCo5z6jGWgqH3FbOAvVk3CMaWZc9HJKQhjSqmXd2535NR-QNo8Hz6HE-n1nH0a3bqwuvmLRJkcPIJVXJiwM9ND5F010PqvSTD7cspKu-fpE7_reLZ6i7R_zhWaPLzlFLpxfo5ICssIPK-UZr9Y4TbblAMUj1bYWKwFUZamwJMg2YCZ7HhkE5M3H74hOLNMbi8MI6xXkpsTQ4YwxmM95msswLnFdNNFlNWdpFi_HTYjRx6qINztoDY8zhnpewKOS-L6WSihAQY5qD3aTBkdFEiMiPo4QZQyFmnCumYgPHpYRxQT1BLlE7zVJ9hbAWggdKcJgvTQUoVY8p81XJ_ZhKSnqoY8ZvtaloOVb10PVQ_8c07e9zMF5B-F___d4NOq6yciLHD29Ru9iW-g4MjkL27Ur7BjGOz3Q
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LT8IwGG-IHtSLDzDiA3vw6ICt3bodDZGAAiEBE26k7bpITDbCtoP-9X7txkDjwdveTR_7Hv2-3-9D6AF-HwUrgVjCEzrMSLglHJdbnLjU9iM3EoZtfzzxBm_0ZeEuauixwsIopUzymWrrQxPLDxOZ662yDnhObkBB4B66lFK3QGtVOyqaTM7WyrGUwz4ztbNAQWkHiQbGafe7xKc-DUrune25UxIS2d2gM-w9zWZT4zq2yxbL0is6cZKnMHZRUfRiTxP1T9F424ciAeWjnWeiLb9-0Tv-t5NnqLHD_OFppc3OUU3FF-hkj66wjvLZWin5jiNl2EAxyPVNgYvARSFqbCgyNZwJnseaQznRkfvsE_M4xHz_wirGaS6w0EhjDIYz3iQiTzOcFk1UeU1J3EDz_vO8N7DKsg3WygZzzGK2HXmBzxxHCCkkISDIFAPLSYErowjngRMGkadNhdBjTHoy1IBcSjzGqc3JJTqIk1hdIaw4Z67kDOZLUQ5q1fak_qpgTkgFJU1U1-O3XBfEHMty6Jqo9WOadvcZmK8g_q__fu8eHQ3m49FyNJy83qDjIkcnsBz_Fh1km1zdgfmRiZZZdd86ztLB
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2004+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing&rft.atitle=Speech+feature+extraction+method+representing+periodicity+and+aperiodicity+in+sub+bands+for+robust+speech+recognition&rft.au=ISHIZUKA%2C+Kentaro&rft.au=MIYAZAKI%2C+Noboru&rft.date=2004-09-28&rft.pub=IEEE&rft.isbn=9780780384842&rft_id=info:doi/10.1109%2FICASSP.2004.1325942&rft.externalDBID=n%2Fa&rft.externalDocID=17565876
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-6149&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-6149&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-6149&client=summon