Deep Boltzmann Machines for i-Vector Based Audio-Visual Person Identification

We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBM $$_{\text {speech}}$$ and DBM $$_\text {face}$$ is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are...

Full description

Saved in:
Bibliographic Details
Published inImage and Video Technology Vol. 9431; pp. 631 - 641
Main Authors Alam, Mohammad Rafiqul, Bennamoun, Mohammed, Togneri, Roberto, Sohel, Ferdous
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2016
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783319294506
3319294504
ISSN0302-9743
1611-3349
DOI10.1007/978-3-319-29451-3_50

Cover

Abstract We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBM $$_{\text {speech}}$$ and DBM $$_\text {face}$$ is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNN $$_{\text {speech}}$$ and DBM-DNN $$_{\text {face}}$$ in this paper. The DBM-DNNs are discriminatively fine-tuned using the back-propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance (cosDist) and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset.
AbstractList We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBM $$_{\text {speech}}$$ and DBM $$_\text {face}$$ is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNN $$_{\text {speech}}$$ and DBM-DNN $$_{\text {face}}$$ in this paper. The DBM-DNNs are discriminatively fine-tuned using the back-propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance (cosDist) and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset.
Author Sohel, Ferdous
Alam, Mohammad Rafiqul
Togneri, Roberto
Bennamoun, Mohammed
Author_xml – sequence: 1
  givenname: Mohammad Rafiqul
  surname: Alam
  fullname: Alam, Mohammad Rafiqul
  email: mohammad.alam@research.uwa.edu.au
– sequence: 2
  givenname: Mohammed
  surname: Bennamoun
  fullname: Bennamoun, Mohammed
– sequence: 3
  givenname: Roberto
  surname: Togneri
  fullname: Togneri, Roberto
– sequence: 4
  givenname: Ferdous
  surname: Sohel
  fullname: Sohel, Ferdous
BookMark eNqNkElOAzEQRc0oQsgNWPQFDJ6HZRKmSEGwgGwtp1OGhmA37c6G0-MAQmLHyqVfeiX_d4z2Y4qA0CklZ5QQfW61wRxzajGzQlLMnSQ7aFRiXsKvjO-iAVW07Liwe392RO2jAeGEYasFP0QDy62kQll9hEY5vxBCqJJGczFAtxcAbTVJ6_7jzcdY3fr6uYmQq5C6qsELqPsyTHyGVTXerJqEF03e-HV1D11OsZqtIPZNaGrfNymeoIPg1xlGP-8QPV5dPkxv8PzuejYdz3HLjOmxFJYxuzRAtWBgAg8gPJOCKCHLtwMrNSR4qa23BhihhoVlsGCDVHLJPR8i9n03t10Tn6Bzy5Res6PEbf25IsNxV3S4L1du669A4htqu_S-gdw72FJ1KdD5df3s2750coqXC1I7JWWZzX8xKY3SWv1inwcZgNY
ContentType Book Chapter
Copyright Springer International Publishing Switzerland 2016
Copyright_xml – notice: Springer International Publishing Switzerland 2016
DBID FFUUA
DEWEY 006.6
DOI 10.1007/978-3-319-29451-3_50
DatabaseName ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9783319294513
3319294512
EISSN 1611-3349
Editor McCane, Brendan
Rivera, Mariano
Yu, Xinguo
Bräunl, Thomas
Editor_xml – sequence: 1
  fullname: McCane, Brendan
– sequence: 2
  fullname: Rivera, Mariano
– sequence: 3
  fullname: Bräunl, Thomas
– sequence: 4
  fullname: Yu, Xinguo
EndPage 641
ExternalDocumentID EBC6300757_655_638
EBC5586776_655_638
GroupedDBID 0D6
0DA
38.
AABBV
AAMCO
AAPIT
AAQZU
ABBVZ
ABMNI
ABOWU
ACLMJ
ADCXD
ADPGQ
AEDXK
AEJGN
AEJLV
AEKFX
AETDV
AEZAY
ALMA_UNASSIGNED_HOLDINGS
AORVH
AZZ
BBABE
CZZ
FFUUA
I4C
IEZ
SBO
SWNTM
TPJZQ
TSXQS
Z7Z
Z81
Z83
Z88
-DT
-GH
-~X
1SB
29L
2HA
2HV
5QI
875
AASHB
ACGFS
AEFIE
EJD
F5P
FEDTE
HVGLF
LAS
LDH
P2P
RNI
RSU
SVGTG
VI1
~02
ID FETCH-LOGICAL-p288t-549229b8e1742e8f3fe4a2540645302f29295ea579a98e20182fbf9e9f565b3a3
ISBN 9783319294506
3319294504
ISSN 0302-9743
IngestDate Wed Sep 17 03:01:51 EDT 2025
Thu May 29 16:32:06 EDT 2025
Thu May 29 16:00:42 EDT 2025
IsPeerReviewed true
IsScholarly true
LCCallNum QA76.575TA1637-1638T
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p288t-549229b8e1742e8f3fe4a2540645302f29295ea579a98e20182fbf9e9f565b3a3
Notes Original Abstract: We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBM\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{\text {speech}}$$\end{document} and DBM\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_\text {face}$$\end{document} is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNN\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{\text {speech}}$$\end{document} and DBM-DNN\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{\text {face}}$$\end{document} in this paper. The DBM-DNNs are discriminatively fine-tuned using the back-propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance (cosDist) and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset.
OCLC 939514697
PQID EBC5586776_655_638
PageCount 11
ParticipantIDs springer_books_10_1007_978_3_319_29451_3_50
proquest_ebookcentralchapters_6300757_655_638
proquest_ebookcentralchapters_5586776_655_638
PublicationCentury 2000
PublicationDate 2016
PublicationDateYYYYMMDD 2016-01-01
PublicationDate_xml – year: 2016
  text: 2016
PublicationDecade 2010
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Cham
PublicationSeriesSubtitle Image Processing, Computer Vision, Pattern Recognition, and Graphics
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSeriesTitleAlternate Lect.Notes Computer
PublicationSubtitle 7th Pacific-Rim Symposium, PSIVT 2015, Auckland, New Zealand, November 25-27, 2015, Revised Selected Papers
PublicationTitle Image and Video Technology
PublicationYear 2016
Publisher Springer International Publishing AG
Springer International Publishing
Publisher_xml – name: Springer International Publishing AG
– name: Springer International Publishing
RelatedPersons Kleinberg, Jon M.
Mattern, Friedemann
Naor, Moni
Mitchell, John C.
Terzopoulos, Demetri
Steffen, Bernhard
Pandu Rangan, C.
Kanade, Takeo
Kittler, Josef
Weikum, Gerhard
Hutchison, David
Tygar, Doug
RelatedPersons_xml – sequence: 1
  givenname: David
  surname: Hutchison
  fullname: Hutchison, David
– sequence: 2
  givenname: Takeo
  surname: Kanade
  fullname: Kanade, Takeo
– sequence: 3
  givenname: Josef
  surname: Kittler
  fullname: Kittler, Josef
– sequence: 4
  givenname: Jon M.
  surname: Kleinberg
  fullname: Kleinberg, Jon M.
– sequence: 5
  givenname: Friedemann
  surname: Mattern
  fullname: Mattern, Friedemann
– sequence: 6
  givenname: John C.
  surname: Mitchell
  fullname: Mitchell, John C.
– sequence: 7
  givenname: Moni
  surname: Naor
  fullname: Naor, Moni
– sequence: 8
  givenname: C.
  surname: Pandu Rangan
  fullname: Pandu Rangan, C.
– sequence: 9
  givenname: Bernhard
  surname: Steffen
  fullname: Steffen, Bernhard
– sequence: 10
  givenname: Demetri
  surname: Terzopoulos
  fullname: Terzopoulos, Demetri
– sequence: 11
  givenname: Doug
  surname: Tygar
  fullname: Tygar, Doug
– sequence: 12
  givenname: Gerhard
  surname: Weikum
  fullname: Weikum, Gerhard
SSID ssj0001658734
ssj0002792
Score 1.8061817
Snippet We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBM...
SourceID springer
proquest
SourceType Publisher
StartPage 631
SubjectTerms Boltzmann Machine
Deep Belief Network
Graphical & digital media applications
Graphics programming
Hide Layer
Image processing
Speaker Recognition
Universal Background Model
Title Deep Boltzmann Machines for i-Vector Based Audio-Visual Person Identification
URI http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=5586776&ppg=638
http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6300757&ppg=638
http://link.springer.com/10.1007/978-3-319-29451-3_50
Volume 9431
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Pb9MwFLZYuSAObPwQgzH5sFtk1MRxah_bUZimDXHYqt0sJ3ZEJZYWml321_OenbRJNgmNSxRFTmT7S16eP7_3PUJOMlukKo0tK1KXsdRZw1QiY1ZKXrjM5Mr6HfzL79nZdXp-I252Be98dkmdfy7uH80r-R9U4RrgilmyT0B2-1C4AOeALxwBYTgOnN8-zRrCBW8x3AZ578XSutUDkjy0-uLcOpqtftX3t6aqsM7QTwx099GFS7bwnH00g1-ZjaZ3drlii-UGM0p-eE88Cnm8ZUPsdSmCeEgRtBThgGTs8FzTb71lJYfvMlGpGGddO6nSYK8fGN1unAXmROGtMeM6KMr2Na6zoOUy0Liez05R-msiJjoTQmOj9W-GpcFwC72pk7JH9qBrI_J8Oj-_WOyINPChJhzLd227HQQfO8Po5Ew-1s3e6mKwIe79jKt98hJzTygmhUDHD8gzV70mr9rKG7QxxG_IJcJKt7DSFlYKsNIWVuphpV1YaYCV9mF9S66_zq9Oz1hTGIOtEylrhqp6icqlg-Vk4mTJS5caWOmj9iAfJ2UCwxbOiIkySjp4I2RS5qVyqgT3PeeGvyOjalW594TmXuENjPI4F2lhjLRcJlYpG5vS8lgcEtZOjfbb903McBEmYqOFQEXErMXtn-0HOB-SqJ1vjc03utXRBqA01wCU9kBpBOrDE5_-kbzYfQ9HZFT_uXOfwIms8-PmNfoLrShtlA
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Image+and+Video+Technology&rft.atitle=Deep+Boltzmann+Machines+for+i-Vector+Based+Audio-Visual+Person+Identification&rft.date=2016-01-01&rft.pub=Springer+International+Publishing+AG&rft.isbn=9783319294506&rft.volume=9431&rft_id=info:doi/10.1007%2F978-3-319-29451-3_50&rft.externalDBID=638&rft.externalDocID=EBC6300757_655_638
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F5586776-l.jpg
http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6300757-l.jpg