An EM Algorithm for Capsule Regression
We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, ). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use th...
Saved in:
| Published in | Neural computation Vol. 33; no. 1; pp. 194 - 226 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | English |
| Published |
One Rogers Street, Cambridge, MA 02142-1209, USA
MIT Press
01.01.2021
MIT Press Journals, The |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0899-7667 1530-888X 1530-888X |
| DOI | 10.1162/neco_a_01336 |
Cover
| Abstract | We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton,
). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression—a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice. |
|---|---|
| AbstractList | We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, 2017). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression—a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice. We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, ). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression—a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice. We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, 2017). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression-a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice.We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, 2017). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression-a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice. |
| Author | Saul, Lawrence K. |
| Author_xml | – sequence: 1 givenname: Lawrence K. surname: Saul fullname: Saul, Lawrence K. email: saul@cs.ucsd.edu organization: Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093-0404 saul@cs.ucsd.edu |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/33080167$$D View this record in MEDLINE/PubMed |
| BookMark | eNpt0E1LxDAQBuAgivuhN89SEMSD1UmTNMlxWdYPWBFEwVto0-napW3WZHvw31vZVReRgZnLw8zwjsh-61ok5ITCFaVpct2idSYzQBlL98iQCgaxUup1nwxBaR3LNJUDMgphCQApBXFIBoyBAprKITmftNHsIZrUC-er9VsTlc5H02wVuhqjJ1x4DKFy7RE5KLM64PF2jsnLzex5ehfPH2_vp5N5bBnodSw1UEVTXuQ0tSgYVagzxq2S0DeUDLW2GSpRAucF5CVHa3kuRJFzhWXCxuRis3fl3XuHYW2aKlis66xF1wWTcJFo2Zfo6dkfunSdb_vvTKK0kJxR4L063aoub7AwK181mf8w3wn04HIDrHcheCx_CAXzFbDZDfj3wabaOfgv_QSVAHiO |
| Cites_doi | 10.7551/mitpress/7496.003.0015 10.1093/biomet/85.4.755 10.1080/10618600.2012.672115 10.1016/S0893-6080(98)00116-6 10.7551/mitpress/5236.001.0001 10.1145/130385.130401 10.1007/BF00994018 10.1007/978-3-642-21735-7_6 10.1093/biomet/81.4.633 10.1007/BF02293851 10.1111/j.1467-9469.2007.00585.x 10.1109/5.726791 10.1023/A:1010933404324 10.1007/978-94-011-5014-9_12 10.2307/2290716 10.1111/1467-9868.00083 10.1162/neco.1992.4.4.473 10.1111/j.2517-6161.1977.tb01600.x |
| ContentType | Journal Article |
| Copyright | Copyright MIT Press Journals, The 2021 |
| Copyright_xml | – notice: Copyright MIT Press Journals, The 2021 |
| DBID | AAYXX CITATION NPM 7SC 8FD JQ2 L7M L~C L~D 7X8 |
| DOI | 10.1162/neco_a_01336 |
| DatabaseName | CrossRef PubMed Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic |
| DatabaseTitle | CrossRef PubMed Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional MEDLINE - Academic |
| DatabaseTitleList | Computer and Information Systems Abstracts CrossRef PubMed MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1530-888X |
| EndPage | 226 |
| ExternalDocumentID | 33080167 10_1162_neco_a_01336 neco_a_01336.pdf |
| Genre | Journal Article Correspondence |
| GroupedDBID | --- -~X .4S .DC 0R~ 123 36B 4.4 6IK AAJGR AALMD ABDBF ABDNZ ABIVO ABJNI ACGFO AEGXH AENEX AFHIN AIAGR ALMA_UNASSIGNED_HOLDINGS ARCSS AVWKF AZFZN BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EAP EAS EBC EBD EBS ECS EDO EMB EMK EMOBN EPL EPS EST ESX F5P FEDTE FNEHJ HZ~ I-F IPLJI JAVBF MCG MINIK MKJ O9- OCL P2P PK0 PQQKQ RMI SV3 TUS WG8 WH7 XJE ZWS 41~ 53G AAFWJ AAYXX ABAZT ABEFU ABVLG ACUHS ACYGS ADIYS ADMLS AMVHM CAG CITATION COF EJD HVGLF H~9 AAYOK AEILP NPM 7SC 8FD JQ2 L7M L~C L~D 7X8 |
| ID | FETCH-LOGICAL-c309t-79018164db16ce5318e9a34c8704c8e73e99cae85f044d0bf4ecc4b55db48ef23 |
| ISSN | 0899-7667 1530-888X |
| IngestDate | Fri Sep 05 11:25:40 EDT 2025 Mon Jun 30 14:02:05 EDT 2025 Thu Apr 03 06:53:08 EDT 2025 Wed Oct 01 02:03:11 EDT 2025 Thu Mar 28 07:29:37 EDT 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c309t-79018164db16ce5318e9a34c8704c8e73e99cae85f044d0bf4ecc4b55db48ef23 |
| Notes | January, 2021 SourceType-Scholarly Journals-1 ObjectType-Correspondence-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 |
| PMID | 33080167 |
| PQID | 2895743104 |
| PQPubID | 37252 |
| PageCount | 33 |
| ParticipantIDs | proquest_miscellaneous_2452979795 crossref_primary_10_1162_neco_a_01336 pubmed_primary_33080167 mit_journals_10_1162_neco_a_01336 proquest_journals_2895743104 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2021-01-01 2021-01-00 20210101 |
| PublicationDateYYYYMMDD | 2021-01-01 |
| PublicationDate_xml | – month: 01 year: 2021 text: 2021-01-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | One Rogers Street, Cambridge, MA 02142-1209, USA |
| PublicationPlace_xml | – name: One Rogers Street, Cambridge, MA 02142-1209, USA – name: United States – name: Cambridge |
| PublicationTitle | Neural computation |
| PublicationTitleAlternate | Neural Comput |
| PublicationYear | 2021 |
| Publisher | MIT Press MIT Press Journals, The |
| Publisher_xml | – name: MIT Press – name: MIT Press Journals, The |
| References | B20 B21 B22 Wainwright M. J. (B38) 2008; 1 Ahmed K. (B1) 2019; 32 Bahadori M. T. (B2) 2018 Hill C. (B11) 2016 Kosiorek A. (B18) 2019; 32 Crammer K. (B6) 2006; 7 Qin Y. (B29) 2020 B26 B27 B28 Wang D. (B39) 2018 Hinton G. E. (B13) 2018 Lange K. L. (B19) 1995; 5 Lloyd S. P. (B23) 1957 Salakhutdinov R. R. (B33) 2003 Xiao H. (B40) 2017 Duarte K. (B8) 2018; 31 Loosli G. (B24) 2007 Hahn T. (B10) 2019; 32 Ghahramani Z. (B9) 1996 Sabour S. (B32) 2017; 30 B30 Rumelhart D. E. (B31) 1986 Tsai Y.-H. (B35) 2020 B12 B14 B36 B15 Neal R. M. (B25) 1993 Zhang L. (B42) 2018; 31 B3 Jordan M. I. (B17) 2018 B4 Jeong T. (B16) 2019 B5 Dempster A. P. (B7) 1977; 39 Srivastava N. (B34) 2014; 15 Venkataraman S. R. (B37) 2020 B41 |
| References_xml | – volume-title: Learning scientific programming with Python year: 2016 ident: B11 – start-page: 301 volume-title: Large scale kernel machines year: 2007 ident: B24 doi: 10.7551/mitpress/7496.003.0015 – volume: 31 start-page: 5814 volume-title: Advances in neural information processing systems year: 2018 ident: B42 – volume: 32 start-page: 15512 volume-title: Advances in neural information processing systems year: 2019 ident: B18 – ident: B22 doi: 10.1093/biomet/85.4.755 – volume: 30 start-page: 3856 volume-title: Advances in neural information processing systems year: 2017 ident: B32 – volume-title: Proceedings of the International Conference on Learning Representations year: 2020 ident: B37 – year: 1996 ident: B9 publication-title: The EM algorithm for mixtures of factor analyzers – year: 2017 ident: B40 publication-title: Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. – ident: B41 doi: 10.1080/10618600.2012.672115 – ident: B28 doi: 10.1016/S0893-6080(98)00116-6 – volume-title: Parallel distributed processing year: 1986 ident: B31 doi: 10.7551/mitpress/5236.001.0001 – volume: 1 start-page: 1 issue: 1 year: 2008 ident: B38 publication-title: Foundations and Trends in Machine Learning – ident: B3 doi: 10.1145/130385.130401 – ident: B5 doi: 10.1007/BF00994018 – volume: 7 start-page: 551 year: 2006 ident: B6 publication-title: Journal of Machine Learning Research – ident: B12 doi: 10.1007/978-3-642-21735-7_6 – year: 1993 ident: B25 publication-title: Probabilistic inference using Markov chain Monte Carlo methods – start-page: 672 volume-title: Proceedings of the 20th International Conference on Machine Learning year: 2003 ident: B33 – start-page: 3071 volume-title: Proceedings of the 36th International Conference on Machine Learning year: 2019 ident: B16 – ident: B21 doi: 10.1093/biomet/81.4.633 – year: 1957 ident: B23 publication-title: Least squares quantization in PCM – ident: B30 doi: 10.1007/BF02293851 – volume: 15 start-page: 1929 year: 2014 ident: B34 publication-title: Journal of Machine Learning Research – ident: B36 doi: 10.1111/j.1467-9469.2007.00585.x – volume: 32 start-page: 9101 volume-title: Advances in neural information processing systems year: 2019 ident: B1 – volume-title: Proceedings of the International Conference on Learning Representations year: 2020 ident: B35 – year: 2018 ident: B39 publication-title: An optimization view on dynamic routing between capsules – ident: B20 doi: 10.1109/5.726791 – year: 2018 ident: B2 publication-title: Spectral capsule networks – ident: B4 doi: 10.1023/A:1010933404324 – volume-title: Proceedings of the International Conference on Learning Representations year: 2018 ident: B13 – ident: B26 doi: 10.1007/978-94-011-5014-9_12 – volume: 32 start-page: 7658 volume-title: Advances in neural information processing systems year: 2019 ident: B10 – ident: B14 doi: 10.2307/2290716 – volume-title: Proceedings of the International Conference on Learning Representations year: 2020 ident: B29 – volume: 31 start-page: 7610 volume-title: Advances in neural information processing systems year: 2018 ident: B8 – ident: B15 doi: 10.1111/1467-9868.00083 – volume: 5 start-page: 1 year: 1995 ident: B19 publication-title: Statistica Sinica – ident: B27 doi: 10.1162/neco.1992.4.4.473 – volume: 39 start-page: 1 issue: 1 year: 1977 ident: B7 publication-title: Journal of the Royal Statistical Society B doi: 10.1111/j.2517-6161.1977.tb01600.x – year: 2018 ident: B17 publication-title: Artificial intelligence—the revolution that hasn't happened yet. |
| SSID | ssj0006105 |
| Score | 2.3446574 |
| Snippet | We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, &... |
| SourceID | proquest pubmed crossref mit |
| SourceType | Aggregation Database Index Database Publisher |
| StartPage | 194 |
| SubjectTerms | Algorithms Classification Matrices (mathematics) Multilayers Nonlinearity Object recognition Regression |
| Title | An EM Algorithm for Capsule Regression |
| URI | https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01336 https://www.ncbi.nlm.nih.gov/pubmed/33080167 https://www.proquest.com/docview/2895743104 https://www.proquest.com/docview/2452979795 |
| Volume | 33 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVEBS databaseName: Academic Search Ultimate (EBSCO) customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn eissn: 1530-888X dateEnd: 20241101 omitProxy: true ssIdentifier: ssj0006105 issn: 0899-7667 databaseCode: ABDBF dateStart: 19970101 isFulltext: true titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn providerName: EBSCOhost – providerCode: PRVEBS databaseName: Inspec with Full Text customDbUrl: eissn: 1530-888X dateEnd: 20241101 omitProxy: false ssIdentifier: ssj0006105 issn: 0899-7667 databaseCode: ADMLS dateStart: 19970101 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text providerName: EBSCOhost – providerCode: PRVEBS databaseName: Mathematics Source customDbUrl: eissn: 1530-888X dateEnd: 20241101 omitProxy: false ssIdentifier: ssj0006105 issn: 0899-7667 databaseCode: AMVHM dateStart: 19970101 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/mathematics-source providerName: EBSCOhost |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLZY98ILjHvZhoIEe5k8kvgWP0aj0wTteFgr9S2yE2cgdek2UiHx63ccO2kKGwIayaoSy43O5345J-eG0DsZG_vRmJnIYCoSgnVOIlwwocqiUFowm5w8OeOnM_ppzubrVpBNdkmtj_Kfd-aV_A-qcA5wtVmy_4BstyicgO-AL4yAMIx_hXFaHY4mh-niYgkm_tfLJmTwWIHdu7DdGC5ciGvV1z9tLY6mIojt5bDhhD9Xq4VLlP7hCs9-7r8QiKNfXggA5bnwjdah7cONTJ9ZpMSCuz4YR6ZlvhCDOTzvU6OrUbGxBRzPRa4z8e_8y5t6rmA5ZyoD7ZLcUeb67Et2MhuPs-loPj24usa2A5j1lPt2KFtoOwaGDgdoO_04GZ93z1XuA1Lbm2_TGHj8of-DGwrG1uW3-n7bodEhpjvokZdUkDokn6AHpnqKHreNNQLPs8_QQVoFo0nQARsAsIEHNlgD-xzNTkbT41PsG1rgnISyxkLa8micFjriuQH2S4xUhObAmTAYQYyUuTIJK0NKi1CXFP5gVDNWaJqYMiYv0KBaVuYVCkohZcllCRMIVUpIVuRKFJHUKpFlHA_R-1YK2ZWrW5I19h6Ps760hugtiCjzm_r7PXP2WgGuJ4Khzqz2GVJYorsM3GQdTqoyyxXMsV59AQcbopdO8N3NEAK2SsTF6z8vvoservf4HhrUNyuzD2pgrd_4_XELx75eAw |
| linkProvider | EBSCOhost |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+EM+Algorithm+for+Capsule+Regression&rft.jtitle=Neural+computation&rft.au=Saul%2C+Lawrence+K&rft.date=2021-01-01&rft.pub=MIT+Press+Journals%2C+The&rft.issn=0899-7667&rft.eissn=1530-888X&rft.volume=33&rft.issue=1&rft.spage=194&rft_id=info:doi/10.1162%2Fneco_a_01336&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0899-7667&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0899-7667&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0899-7667&client=summon |