Deep Boltzmann Machines for i-Vector Based Audio-Visual Person Identification
We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBM $$_{\text {speech}}$$ and DBM $$_\text {face}$$ is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are...
Saved in:
| Published in | Image and Video Technology Vol. 9431; pp. 631 - 641 |
|---|---|
| Main Authors | , , , |
| Format | Book Chapter |
| Language | English |
| Published |
Switzerland
Springer International Publishing AG
2016
Springer International Publishing |
| Series | Lecture Notes in Computer Science |
| Subjects | |
| Online Access | Get full text |
| ISBN | 9783319294506 3319294504 |
| ISSN | 0302-9743 1611-3349 |
| DOI | 10.1007/978-3-319-29451-3_50 |
Cover
| Abstract | We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBM $$_{\text {speech}}$$ and DBM $$_\text {face}$$ is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNN $$_{\text {speech}}$$ and DBM-DNN $$_{\text {face}}$$ in this paper. The DBM-DNNs are discriminatively fine-tuned using the back-propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance (cosDist) and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset. |
|---|---|
| AbstractList | We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBM $$_{\text {speech}}$$ and DBM $$_\text {face}$$ is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNN $$_{\text {speech}}$$ and DBM-DNN $$_{\text {face}}$$ in this paper. The DBM-DNNs are discriminatively fine-tuned using the back-propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance (cosDist) and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset. |
| Author | Sohel, Ferdous Alam, Mohammad Rafiqul Togneri, Roberto Bennamoun, Mohammed |
| Author_xml | – sequence: 1 givenname: Mohammad Rafiqul surname: Alam fullname: Alam, Mohammad Rafiqul email: mohammad.alam@research.uwa.edu.au – sequence: 2 givenname: Mohammed surname: Bennamoun fullname: Bennamoun, Mohammed – sequence: 3 givenname: Roberto surname: Togneri fullname: Togneri, Roberto – sequence: 4 givenname: Ferdous surname: Sohel fullname: Sohel, Ferdous |
| BookMark | eNqNkElOAzEQRc0oQsgNWPQFDJ6HZRKmSEGwgGwtp1OGhmA37c6G0-MAQmLHyqVfeiX_d4z2Y4qA0CklZ5QQfW61wRxzajGzQlLMnSQ7aFRiXsKvjO-iAVW07Liwe392RO2jAeGEYasFP0QDy62kQll9hEY5vxBCqJJGczFAtxcAbTVJ6_7jzcdY3fr6uYmQq5C6qsELqPsyTHyGVTXerJqEF03e-HV1D11OsZqtIPZNaGrfNymeoIPg1xlGP-8QPV5dPkxv8PzuejYdz3HLjOmxFJYxuzRAtWBgAg8gPJOCKCHLtwMrNSR4qa23BhihhoVlsGCDVHLJPR8i9n03t10Tn6Bzy5Res6PEbf25IsNxV3S4L1du669A4htqu_S-gdw72FJ1KdD5df3s2750coqXC1I7JWWZzX8xKY3SWv1inwcZgNY |
| ContentType | Book Chapter |
| Copyright | Springer International Publishing Switzerland 2016 |
| Copyright_xml | – notice: Springer International Publishing Switzerland 2016 |
| DBID | FFUUA |
| DEWEY | 006.6 |
| DOI | 10.1007/978-3-319-29451-3_50 |
| DatabaseName | ProQuest Ebook Central - Book Chapters - Demo use only |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9783319294513 3319294512 |
| EISSN | 1611-3349 |
| Editor | McCane, Brendan Rivera, Mariano Yu, Xinguo Bräunl, Thomas |
| Editor_xml | – sequence: 1 fullname: McCane, Brendan – sequence: 2 fullname: Rivera, Mariano – sequence: 3 fullname: Bräunl, Thomas – sequence: 4 fullname: Yu, Xinguo |
| EndPage | 641 |
| ExternalDocumentID | EBC6300757_655_638 EBC5586776_655_638 |
| GroupedDBID | 0D6 0DA 38. AABBV AAMCO AAPIT AAQZU ABBVZ ABMNI ABOWU ACLMJ ADCXD ADPGQ AEDXK AEJGN AEJLV AEKFX AETDV AEZAY ALMA_UNASSIGNED_HOLDINGS AORVH AZZ BBABE CZZ FFUUA I4C IEZ SBO SWNTM TPJZQ TSXQS Z7Z Z81 Z83 Z88 -DT -GH -~X 1SB 29L 2HA 2HV 5QI 875 AASHB ACGFS AEFIE EJD F5P FEDTE HVGLF LAS LDH P2P RNI RSU SVGTG VI1 ~02 |
| ID | FETCH-LOGICAL-p288t-549229b8e1742e8f3fe4a2540645302f29295ea579a98e20182fbf9e9f565b3a3 |
| ISBN | 9783319294506 3319294504 |
| ISSN | 0302-9743 |
| IngestDate | Wed Sep 17 03:01:51 EDT 2025 Thu May 29 16:32:06 EDT 2025 Thu May 29 16:00:42 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| LCCallNum | QA76.575TA1637-1638T |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-p288t-549229b8e1742e8f3fe4a2540645302f29295ea579a98e20182fbf9e9f565b3a3 |
| Notes | Original Abstract: We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBM\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{\text {speech}}$$\end{document} and DBM\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_\text {face}$$\end{document} is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNN\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{\text {speech}}$$\end{document} and DBM-DNN\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{\text {face}}$$\end{document} in this paper. The DBM-DNNs are discriminatively fine-tuned using the back-propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance (cosDist) and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset. |
| OCLC | 939514697 |
| PQID | EBC5586776_655_638 |
| PageCount | 11 |
| ParticipantIDs | springer_books_10_1007_978_3_319_29451_3_50 proquest_ebookcentralchapters_6300757_655_638 proquest_ebookcentralchapters_5586776_655_638 |
| PublicationCentury | 2000 |
| PublicationDate | 2016 |
| PublicationDateYYYYMMDD | 2016-01-01 |
| PublicationDate_xml | – year: 2016 text: 2016 |
| PublicationDecade | 2010 |
| PublicationPlace | Switzerland |
| PublicationPlace_xml | – name: Switzerland – name: Cham |
| PublicationSeriesSubtitle | Image Processing, Computer Vision, Pattern Recognition, and Graphics |
| PublicationSeriesTitle | Lecture Notes in Computer Science |
| PublicationSeriesTitleAlternate | Lect.Notes Computer |
| PublicationSubtitle | 7th Pacific-Rim Symposium, PSIVT 2015, Auckland, New Zealand, November 25-27, 2015, Revised Selected Papers |
| PublicationTitle | Image and Video Technology |
| PublicationYear | 2016 |
| Publisher | Springer International Publishing AG Springer International Publishing |
| Publisher_xml | – name: Springer International Publishing AG – name: Springer International Publishing |
| RelatedPersons | Kleinberg, Jon M. Mattern, Friedemann Naor, Moni Mitchell, John C. Terzopoulos, Demetri Steffen, Bernhard Pandu Rangan, C. Kanade, Takeo Kittler, Josef Weikum, Gerhard Hutchison, David Tygar, Doug |
| RelatedPersons_xml | – sequence: 1 givenname: David surname: Hutchison fullname: Hutchison, David – sequence: 2 givenname: Takeo surname: Kanade fullname: Kanade, Takeo – sequence: 3 givenname: Josef surname: Kittler fullname: Kittler, Josef – sequence: 4 givenname: Jon M. surname: Kleinberg fullname: Kleinberg, Jon M. – sequence: 5 givenname: Friedemann surname: Mattern fullname: Mattern, Friedemann – sequence: 6 givenname: John C. surname: Mitchell fullname: Mitchell, John C. – sequence: 7 givenname: Moni surname: Naor fullname: Naor, Moni – sequence: 8 givenname: C. surname: Pandu Rangan fullname: Pandu Rangan, C. – sequence: 9 givenname: Bernhard surname: Steffen fullname: Steffen, Bernhard – sequence: 10 givenname: Demetri surname: Terzopoulos fullname: Terzopoulos, Demetri – sequence: 11 givenname: Doug surname: Tygar fullname: Tygar, Doug – sequence: 12 givenname: Gerhard surname: Weikum fullname: Weikum, Gerhard |
| SSID | ssj0001658734 ssj0002792 |
| Score | 1.8061817 |
| Snippet | We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBM... |
| SourceID | springer proquest |
| SourceType | Publisher |
| StartPage | 631 |
| SubjectTerms | Boltzmann Machine Deep Belief Network Graphical & digital media applications Graphics programming Hide Layer Image processing Speaker Recognition Universal Background Model |
| Title | Deep Boltzmann Machines for i-Vector Based Audio-Visual Person Identification |
| URI | http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=5586776&ppg=638 http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6300757&ppg=638 http://link.springer.com/10.1007/978-3-319-29451-3_50 |
| Volume | 9431 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Pb9MwFLZYuSAObPwQgzH5sFtk1MRxah_bUZimDXHYqt0sJ3ZEJZYWml321_OenbRJNgmNSxRFTmT7S16eP7_3PUJOMlukKo0tK1KXsdRZw1QiY1ZKXrjM5Mr6HfzL79nZdXp-I252Be98dkmdfy7uH80r-R9U4RrgilmyT0B2-1C4AOeALxwBYTgOnN8-zRrCBW8x3AZ578XSutUDkjy0-uLcOpqtftX3t6aqsM7QTwx099GFS7bwnH00g1-ZjaZ3drlii-UGM0p-eE88Cnm8ZUPsdSmCeEgRtBThgGTs8FzTb71lJYfvMlGpGGddO6nSYK8fGN1unAXmROGtMeM6KMr2Na6zoOUy0Liez05R-msiJjoTQmOj9W-GpcFwC72pk7JH9qBrI_J8Oj-_WOyINPChJhzLd227HQQfO8Po5Ew-1s3e6mKwIe79jKt98hJzTygmhUDHD8gzV70mr9rKG7QxxG_IJcJKt7DSFlYKsNIWVuphpV1YaYCV9mF9S66_zq9Oz1hTGIOtEylrhqp6icqlg-Vk4mTJS5caWOmj9iAfJ2UCwxbOiIkySjp4I2RS5qVyqgT3PeeGvyOjalW594TmXuENjPI4F2lhjLRcJlYpG5vS8lgcEtZOjfbb903McBEmYqOFQEXErMXtn-0HOB-SqJ1vjc03utXRBqA01wCU9kBpBOrDE5_-kbzYfQ9HZFT_uXOfwIms8-PmNfoLrShtlA |
| linkProvider | Library Specific Holdings |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Image+and+Video+Technology&rft.atitle=Deep+Boltzmann+Machines+for+i-Vector+Based+Audio-Visual+Person+Identification&rft.date=2016-01-01&rft.pub=Springer+International+Publishing+AG&rft.isbn=9783319294506&rft.volume=9431&rft_id=info:doi/10.1007%2F978-3-319-29451-3_50&rft.externalDBID=638&rft.externalDocID=EBC6300757_655_638 |
| thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F5586776-l.jpg http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6300757-l.jpg |