Deep Boltzmann Machines for i-Vector Based Audio-Visual Person Identification

We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBM $$_{\text {speech}}$$ and DBM $$_\text {face}$$ is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are...

Full description

Saved in:

Bibliographic Details
Published in	Image and Video Technology Vol. 9431; pp. 631 - 641
Main Authors	Alam, Mohammad Rafiqul, Bennamoun, Mohammed, Togneri, Roberto, Sohel, Ferdous
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2016 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Boltzmann Machine Deep Belief Network Graphical & digital media applications Graphics programming Hide Layer Image processing Speaker Recognition Universal Background Model
Online Access	Get full text
ISBN	9783319294506 3319294504
ISSN	0302-9743 1611-3349
DOI	10.1007/978-3-319-29451-3_50

Cover

Abstract	We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBM $$_{\text {speech}}$$ and DBM $$_\text {face}$$ is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNN $$_{\text {speech}}$$ and DBM-DNN $$_{\text {face}}$$ in this paper. The DBM-DNNs are discriminatively fine-tuned using the back-propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance (cosDist) and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset.
AbstractList	We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBM $$_{\text {speech}}$$ and DBM $$_\text {face}$$ is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNN $$_{\text {speech}}$$ and DBM-DNN $$_{\text {face}}$$ in this paper. The DBM-DNNs are discriminatively fine-tuned using the back-propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance (cosDist) and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset.
Author	Sohel, Ferdous Alam, Mohammad Rafiqul Togneri, Roberto Bennamoun, Mohammed
Author_xml	– sequence: 1 givenname: Mohammad Rafiqul surname: Alam fullname: Alam, Mohammad Rafiqul email: mohammad.alam@research.uwa.edu.au – sequence: 2 givenname: Mohammed surname: Bennamoun fullname: Bennamoun, Mohammed – sequence: 3 givenname: Roberto surname: Togneri fullname: Togneri, Roberto – sequence: 4 givenname: Ferdous surname: Sohel fullname: Sohel, Ferdous
BookMark	eNqNkElOAzEQRc0oQsgNWPQFDJ6HZRKmSEGwgGwtp1OGhmA37c6G0-MAQmLHyqVfeiX_d4z2Y4qA0CklZ5QQfW61wRxzajGzQlLMnSQ7aFRiXsKvjO-iAVW07Liwe392RO2jAeGEYasFP0QDy62kQll9hEY5vxBCqJJGczFAtxcAbTVJ6_7jzcdY3fr6uYmQq5C6qsELqPsyTHyGVTXerJqEF03e-HV1D11OsZqtIPZNaGrfNymeoIPg1xlGP-8QPV5dPkxv8PzuejYdz3HLjOmxFJYxuzRAtWBgAg8gPJOCKCHLtwMrNSR4qa23BhihhoVlsGCDVHLJPR8i9n03t10Tn6Bzy5Res6PEbf25IsNxV3S4L1du669A4htqu_S-gdw72FJ1KdD5df3s2750coqXC1I7JWWZzX8xKY3SWv1inwcZgNY
ContentType	Book Chapter
Copyright	Springer International Publishing Switzerland 2016
Copyright_xml	– notice: Springer International Publishing Switzerland 2016
DBID	FFUUA
DEWEY	006.6
DOI	10.1007/978-3-319-29451-3_50
DatabaseName	ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9783319294513 3319294512
EISSN	1611-3349
Editor	McCane, Brendan Rivera, Mariano Yu, Xinguo Bräunl, Thomas
Editor_xml	– sequence: 1 fullname: McCane, Brendan – sequence: 2 fullname: Rivera, Mariano – sequence: 3 fullname: Bräunl, Thomas – sequence: 4 fullname: Yu, Xinguo
EndPage	641
ExternalDocumentID	EBC6300757_655_638 EBC5586776_655_638
GroupedDBID	0D6 0DA 38. AABBV AAMCO AAPIT AAQZU ABBVZ ABMNI ABOWU ACLMJ ADCXD ADPGQ AEDXK AEJGN AEJLV AEKFX AETDV AEZAY ALMA_UNASSIGNED_HOLDINGS AORVH AZZ BBABE CZZ FFUUA I4C IEZ SBO SWNTM TPJZQ TSXQS Z7Z Z81 Z83 Z88 -DT -GH -~X 1SB 29L 2HA 2HV 5QI 875 AASHB ACGFS AEFIE EJD F5P FEDTE HVGLF LAS LDH P2P RNI RSU SVGTG VI1 ~02
ID	FETCH-LOGICAL-p288t-549229b8e1742e8f3fe4a2540645302f29295ea579a98e20182fbf9e9f565b3a3
ISBN	9783319294506 3319294504
ISSN	0302-9743
IngestDate	Wed Sep 17 03:01:51 EDT 2025 Thu May 29 16:32:06 EDT 2025 Thu May 29 16:00:42 EDT 2025
IsPeerReviewed	true
IsScholarly	true
LCCallNum	QA76.575TA1637-1638T
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-p288t-549229b8e1742e8f3fe4a2540645302f29295ea579a98e20182fbf9e9f565b3a3
Notes	Original Abstract: We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBM\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{\text {speech}}$$\end{document} and DBM\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_\text {face}$$\end{document} is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNN\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{\text {speech}}$$\end{document} and DBM-DNN\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{\text {face}}$$\end{document} in this paper. The DBM-DNNs are discriminatively fine-tuned using the back-propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance (cosDist) and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset.
OCLC	939514697
PQID	EBC5586776_655_638
PageCount	11
ParticipantIDs	springer_books_10_1007_978_3_319_29451_3_50 proquest_ebookcentralchapters_6300757_655_638 proquest_ebookcentralchapters_5586776_655_638
PublicationCentury	2000
PublicationDate	2016
PublicationDateYYYYMMDD	2016-01-01
PublicationDate_xml	– year: 2016 text: 2016
PublicationDecade	2010
PublicationPlace	Switzerland
PublicationPlace_xml	– name: Switzerland – name: Cham
PublicationSeriesSubtitle	Image Processing, Computer Vision, Pattern Recognition, and Graphics
PublicationSeriesTitle	Lecture Notes in Computer Science
PublicationSeriesTitleAlternate	Lect.Notes Computer
PublicationSubtitle	7th Pacific-Rim Symposium, PSIVT 2015, Auckland, New Zealand, November 25-27, 2015, Revised Selected Papers
PublicationTitle	Image and Video Technology
PublicationYear	2016
Publisher	Springer International Publishing AG Springer International Publishing
Publisher_xml	– name: Springer International Publishing AG – name: Springer International Publishing
RelatedPersons	Kleinberg, Jon M. Mattern, Friedemann Naor, Moni Mitchell, John C. Terzopoulos, Demetri Steffen, Bernhard Pandu Rangan, C. Kanade, Takeo Kittler, Josef Weikum, Gerhard Hutchison, David Tygar, Doug
RelatedPersons_xml	– sequence: 1 givenname: David surname: Hutchison fullname: Hutchison, David – sequence: 2 givenname: Takeo surname: Kanade fullname: Kanade, Takeo – sequence: 3 givenname: Josef surname: Kittler fullname: Kittler, Josef – sequence: 4 givenname: Jon M. surname: Kleinberg fullname: Kleinberg, Jon M. – sequence: 5 givenname: Friedemann surname: Mattern fullname: Mattern, Friedemann – sequence: 6 givenname: John C. surname: Mitchell fullname: Mitchell, John C. – sequence: 7 givenname: Moni surname: Naor fullname: Naor, Moni – sequence: 8 givenname: C. surname: Pandu Rangan fullname: Pandu Rangan, C. – sequence: 9 givenname: Bernhard surname: Steffen fullname: Steffen, Bernhard – sequence: 10 givenname: Demetri surname: Terzopoulos fullname: Terzopoulos, Demetri – sequence: 11 givenname: Doug surname: Tygar fullname: Tygar, Doug – sequence: 12 givenname: Gerhard surname: Weikum fullname: Weikum, Gerhard
SSID	ssj0001658734 ssj0002792
Score	1.8061817
Snippet	We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBM...
SourceID	springer proquest
SourceType	Publisher
StartPage	631
SubjectTerms	Boltzmann Machine Deep Belief Network Graphical & digital media applications Graphics programming Hide Layer Image processing Speaker Recognition Universal Background Model
Title	Deep Boltzmann Machines for i-Vector Based Audio-Visual Person Identification
URI	http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=5586776&ppg=638 http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6300757&ppg=638 http://link.springer.com/10.1007/978-3-319-29451-3_50
Volume	9431
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Pb9MwFLZYuSAObPwQgzH5sFtk1MRxah_bUZimDXHYqt0sJ3ZEJZYWml321_OenbRJNgmNSxRFTmT7S16eP7_3PUJOMlukKo0tK1KXsdRZw1QiY1ZKXrjM5Mr6HfzL79nZdXp-I252Be98dkmdfy7uH80r-R9U4RrgilmyT0B2-1C4AOeALxwBYTgOnN8-zRrCBW8x3AZ578XSutUDkjy0-uLcOpqtftX3t6aqsM7QTwx099GFS7bwnH00g1-ZjaZ3drlii-UGM0p-eE88Cnm8ZUPsdSmCeEgRtBThgGTs8FzTb71lJYfvMlGpGGddO6nSYK8fGN1unAXmROGtMeM6KMr2Na6zoOUy0Liez05R-msiJjoTQmOj9W-GpcFwC72pk7JH9qBrI_J8Oj-_WOyINPChJhzLd227HQQfO8Po5Ew-1s3e6mKwIe79jKt98hJzTygmhUDHD8gzV70mr9rKG7QxxG_IJcJKt7DSFlYKsNIWVuphpV1YaYCV9mF9S66_zq9Oz1hTGIOtEylrhqp6icqlg-Vk4mTJS5caWOmj9iAfJ2UCwxbOiIkySjp4I2RS5qVyqgT3PeeGvyOjalW594TmXuENjPI4F2lhjLRcJlYpG5vS8lgcEtZOjfbb903McBEmYqOFQEXErMXtn-0HOB-SqJ1vjc03utXRBqA01wCU9kBpBOrDE5_-kbzYfQ9HZFT_uXOfwIms8-PmNfoLrShtlA
linkProvider	Library Specific Holdings
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Image+and+Video+Technology&rft.atitle=Deep+Boltzmann+Machines+for+i-Vector+Based+Audio-Visual+Person+Identification&rft.date=2016-01-01&rft.pub=Springer+International+Publishing+AG&rft.isbn=9783319294506&rft.volume=9431&rft_id=info:doi/10.1007%2F978-3-319-29451-3_50&rft.externalDBID=638&rft.externalDocID=EBC6300757_655_638
thumbnail_s	http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F5586776-l.jpg http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6300757-l.jpg