An EM Algorithm for Capsule Regression

We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, ). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use th...

Full description

Saved in:

Bibliographic Details
Published in	Neural computation Vol. 33; no. 1; pp. 194 - 226
Main Author	Saul, Lawrence K.
Format	Journal Article
Language	English
Published	One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 01.01.2021 MIT Press Journals, The
Subjects	Algorithms Classification Matrices (mathematics) Multilayers Nonlinearity Object recognition Regression
Online Access	Get full text
ISSN	0899-7667 1530-888X 1530-888X
DOI	10.1162/neco_a_01336

Cover

Abstract	We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, ). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression—a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice.
AbstractList	We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, 2017). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression—a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice. We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, ). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression—a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice. We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, 2017). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression-a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice.We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, 2017). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression-a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice.
Author	Saul, Lawrence K.
Author_xml	– sequence: 1 givenname: Lawrence K. surname: Saul fullname: Saul, Lawrence K. email: saul@cs.ucsd.edu organization: Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093-0404 saul@cs.ucsd.edu
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/33080167$$D View this record in MEDLINE/PubMed
BookMark	eNpt0E1LxDAQBuAgivuhN89SEMSD1UmTNMlxWdYPWBFEwVto0-napW3WZHvw31vZVReRgZnLw8zwjsh-61ok5ITCFaVpct2idSYzQBlL98iQCgaxUup1nwxBaR3LNJUDMgphCQApBXFIBoyBAprKITmftNHsIZrUC-er9VsTlc5H02wVuhqjJ1x4DKFy7RE5KLM64PF2jsnLzex5ehfPH2_vp5N5bBnodSw1UEVTXuQ0tSgYVagzxq2S0DeUDLW2GSpRAucF5CVHa3kuRJFzhWXCxuRis3fl3XuHYW2aKlis66xF1wWTcJFo2Zfo6dkfunSdb_vvTKK0kJxR4L063aoub7AwK181mf8w3wn04HIDrHcheCx_CAXzFbDZDfj3wabaOfgv_QSVAHiO
Cites_doi	10.7551/mitpress/7496.003.0015 10.1093/biomet/85.4.755 10.1080/10618600.2012.672115 10.1016/S0893-6080(98)00116-6 10.7551/mitpress/5236.001.0001 10.1145/130385.130401 10.1007/BF00994018 10.1007/978-3-642-21735-7_6 10.1093/biomet/81.4.633 10.1007/BF02293851 10.1111/j.1467-9469.2007.00585.x 10.1109/5.726791 10.1023/A:1010933404324 10.1007/978-94-011-5014-9_12 10.2307/2290716 10.1111/1467-9868.00083 10.1162/neco.1992.4.4.473 10.1111/j.2517-6161.1977.tb01600.x
ContentType	Journal Article
Copyright	Copyright MIT Press Journals, The 2021
Copyright_xml	– notice: Copyright MIT Press Journals, The 2021
DBID	AAYXX CITATION NPM 7SC 8FD JQ2 L7M L~C L~D 7X8
DOI	10.1162/neco_a_01336
DatabaseName	CrossRef PubMed Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic
DatabaseTitle	CrossRef PubMed Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional MEDLINE - Academic
DatabaseTitleList	Computer and Information Systems Abstracts CrossRef PubMed MEDLINE - Academic
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1530-888X
EndPage	226
ExternalDocumentID	33080167 10_1162_neco_a_01336 neco_a_01336.pdf
Genre	Journal Article Correspondence
GroupedDBID	--- -~X .4S .DC 0R~ 123 36B 4.4 6IK AAJGR AALMD ABDBF ABDNZ ABIVO ABJNI ACGFO AEGXH AENEX AFHIN AIAGR ALMA_UNASSIGNED_HOLDINGS ARCSS AVWKF AZFZN BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EAP EAS EBC EBD EBS ECS EDO EMB EMK EMOBN EPL EPS EST ESX F5P FEDTE FNEHJ HZ~ I-F IPLJI JAVBF MCG MINIK MKJ O9- OCL P2P PK0 PQQKQ RMI SV3 TUS WG8 WH7 XJE ZWS 41~ 53G AAFWJ AAYXX ABAZT ABEFU ABVLG ACUHS ACYGS ADIYS ADMLS AMVHM CAG CITATION COF EJD HVGLF H~9 AAYOK AEILP NPM 7SC 8FD JQ2 L7M L~C L~D 7X8
ID	FETCH-LOGICAL-c309t-79018164db16ce5318e9a34c8704c8e73e99cae85f044d0bf4ecc4b55db48ef23
ISSN	0899-7667 1530-888X
IngestDate	Fri Sep 05 11:25:40 EDT 2025 Mon Jun 30 14:02:05 EDT 2025 Thu Apr 03 06:53:08 EDT 2025 Wed Oct 01 02:03:11 EDT 2025 Thu Mar 28 07:29:37 EDT 2024
IsPeerReviewed	true
IsScholarly	true
Issue	1
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c309t-79018164db16ce5318e9a34c8704c8e73e99cae85f044d0bf4ecc4b55db48ef23
Notes	January, 2021 SourceType-Scholarly Journals-1 ObjectType-Correspondence-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
PMID	33080167
PQID	2895743104
PQPubID	37252
PageCount	33
ParticipantIDs	proquest_miscellaneous_2452979795 crossref_primary_10_1162_neco_a_01336 pubmed_primary_33080167 mit_journals_10_1162_neco_a_01336 proquest_journals_2895743104
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2021-01-01 2021-01-00 20210101
PublicationDateYYYYMMDD	2021-01-01
PublicationDate_xml	– month: 01 year: 2021 text: 2021-01-01 day: 01
PublicationDecade	2020
PublicationPlace	One Rogers Street, Cambridge, MA 02142-1209, USA
PublicationPlace_xml	– name: One Rogers Street, Cambridge, MA 02142-1209, USA – name: United States – name: Cambridge
PublicationTitle	Neural computation
PublicationTitleAlternate	Neural Comput
PublicationYear	2021
Publisher	MIT Press MIT Press Journals, The
Publisher_xml	– name: MIT Press – name: MIT Press Journals, The
References	B20 B21 B22 Wainwright M. J. (B38) 2008; 1 Ahmed K. (B1) 2019; 32 Bahadori M. T. (B2) 2018 Hill C. (B11) 2016 Kosiorek A. (B18) 2019; 32 Crammer K. (B6) 2006; 7 Qin Y. (B29) 2020 B26 B27 B28 Wang D. (B39) 2018 Hinton G. E. (B13) 2018 Lange K. L. (B19) 1995; 5 Lloyd S. P. (B23) 1957 Salakhutdinov R. R. (B33) 2003 Xiao H. (B40) 2017 Duarte K. (B8) 2018; 31 Loosli G. (B24) 2007 Hahn T. (B10) 2019; 32 Ghahramani Z. (B9) 1996 Sabour S. (B32) 2017; 30 B30 Rumelhart D. E. (B31) 1986 Tsai Y.-H. (B35) 2020 B12 B14 B36 B15 Neal R. M. (B25) 1993 Zhang L. (B42) 2018; 31 B3 Jordan M. I. (B17) 2018 B4 Jeong T. (B16) 2019 B5 Dempster A. P. (B7) 1977; 39 Srivastava N. (B34) 2014; 15 Venkataraman S. R. (B37) 2020 B41
References_xml	– volume-title: Learning scientific programming with Python year: 2016 ident: B11 – start-page: 301 volume-title: Large scale kernel machines year: 2007 ident: B24 doi: 10.7551/mitpress/7496.003.0015 – volume: 31 start-page: 5814 volume-title: Advances in neural information processing systems year: 2018 ident: B42 – volume: 32 start-page: 15512 volume-title: Advances in neural information processing systems year: 2019 ident: B18 – ident: B22 doi: 10.1093/biomet/85.4.755 – volume: 30 start-page: 3856 volume-title: Advances in neural information processing systems year: 2017 ident: B32 – volume-title: Proceedings of the International Conference on Learning Representations year: 2020 ident: B37 – year: 1996 ident: B9 publication-title: The EM algorithm for mixtures of factor analyzers – year: 2017 ident: B40 publication-title: Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. – ident: B41 doi: 10.1080/10618600.2012.672115 – ident: B28 doi: 10.1016/S0893-6080(98)00116-6 – volume-title: Parallel distributed processing year: 1986 ident: B31 doi: 10.7551/mitpress/5236.001.0001 – volume: 1 start-page: 1 issue: 1 year: 2008 ident: B38 publication-title: Foundations and Trends in Machine Learning – ident: B3 doi: 10.1145/130385.130401 – ident: B5 doi: 10.1007/BF00994018 – volume: 7 start-page: 551 year: 2006 ident: B6 publication-title: Journal of Machine Learning Research – ident: B12 doi: 10.1007/978-3-642-21735-7_6 – year: 1993 ident: B25 publication-title: Probabilistic inference using Markov chain Monte Carlo methods – start-page: 672 volume-title: Proceedings of the 20th International Conference on Machine Learning year: 2003 ident: B33 – start-page: 3071 volume-title: Proceedings of the 36th International Conference on Machine Learning year: 2019 ident: B16 – ident: B21 doi: 10.1093/biomet/81.4.633 – year: 1957 ident: B23 publication-title: Least squares quantization in PCM – ident: B30 doi: 10.1007/BF02293851 – volume: 15 start-page: 1929 year: 2014 ident: B34 publication-title: Journal of Machine Learning Research – ident: B36 doi: 10.1111/j.1467-9469.2007.00585.x – volume: 32 start-page: 9101 volume-title: Advances in neural information processing systems year: 2019 ident: B1 – volume-title: Proceedings of the International Conference on Learning Representations year: 2020 ident: B35 – year: 2018 ident: B39 publication-title: An optimization view on dynamic routing between capsules – ident: B20 doi: 10.1109/5.726791 – year: 2018 ident: B2 publication-title: Spectral capsule networks – ident: B4 doi: 10.1023/A:1010933404324 – volume-title: Proceedings of the International Conference on Learning Representations year: 2018 ident: B13 – ident: B26 doi: 10.1007/978-94-011-5014-9_12 – volume: 32 start-page: 7658 volume-title: Advances in neural information processing systems year: 2019 ident: B10 – ident: B14 doi: 10.2307/2290716 – volume-title: Proceedings of the International Conference on Learning Representations year: 2020 ident: B29 – volume: 31 start-page: 7610 volume-title: Advances in neural information processing systems year: 2018 ident: B8 – ident: B15 doi: 10.1111/1467-9868.00083 – volume: 5 start-page: 1 year: 1995 ident: B19 publication-title: Statistica Sinica – ident: B27 doi: 10.1162/neco.1992.4.4.473 – volume: 39 start-page: 1 issue: 1 year: 1977 ident: B7 publication-title: Journal of the Royal Statistical Society B doi: 10.1111/j.2517-6161.1977.tb01600.x – year: 2018 ident: B17 publication-title: Artificial intelligence—the revolution that hasn't happened yet.
SSID	ssj0006105
Score	2.3446574
Snippet	We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, &...
SourceID	proquest pubmed crossref mit
SourceType	Aggregation Database Index Database Publisher
StartPage	194
SubjectTerms	Algorithms Classification Matrices (mathematics) Multilayers Nonlinearity Object recognition Regression
Title	An EM Algorithm for Capsule Regression
URI	https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01336 https://www.ncbi.nlm.nih.gov/pubmed/33080167 https://www.proquest.com/docview/2895743104 https://www.proquest.com/docview/2452979795
Volume	33
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVEBS databaseName: Academic Search Ultimate (EBSCO) customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn eissn: 1530-888X dateEnd: 20241101 omitProxy: true ssIdentifier: ssj0006105 issn: 0899-7667 databaseCode: ABDBF dateStart: 19970101 isFulltext: true titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn providerName: EBSCOhost – providerCode: PRVEBS databaseName: Inspec with Full Text customDbUrl: eissn: 1530-888X dateEnd: 20241101 omitProxy: false ssIdentifier: ssj0006105 issn: 0899-7667 databaseCode: ADMLS dateStart: 19970101 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text providerName: EBSCOhost – providerCode: PRVEBS databaseName: Mathematics Source customDbUrl: eissn: 1530-888X dateEnd: 20241101 omitProxy: false ssIdentifier: ssj0006105 issn: 0899-7667 databaseCode: AMVHM dateStart: 19970101 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/mathematics-source providerName: EBSCOhost
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLZY98ILjHvZhoIEe5k8kvgWP0aj0wTteFgr9S2yE2cgdek2UiHx63ccO2kKGwIayaoSy43O5345J-eG0DsZG_vRmJnIYCoSgnVOIlwwocqiUFowm5w8OeOnM_ppzubrVpBNdkmtj_Kfd-aV_A-qcA5wtVmy_4BstyicgO-AL4yAMIx_hXFaHY4mh-niYgkm_tfLJmTwWIHdu7DdGC5ciGvV1z9tLY6mIojt5bDhhD9Xq4VLlP7hCs9-7r8QiKNfXggA5bnwjdah7cONTJ9ZpMSCuz4YR6ZlvhCDOTzvU6OrUbGxBRzPRa4z8e_8y5t6rmA5ZyoD7ZLcUeb67Et2MhuPs-loPj24usa2A5j1lPt2KFtoOwaGDgdoO_04GZ93z1XuA1Lbm2_TGHj8of-DGwrG1uW3-n7bodEhpjvokZdUkDokn6AHpnqKHreNNQLPs8_QQVoFo0nQARsAsIEHNlgD-xzNTkbT41PsG1rgnISyxkLa8micFjriuQH2S4xUhObAmTAYQYyUuTIJK0NKi1CXFP5gVDNWaJqYMiYv0KBaVuYVCkohZcllCRMIVUpIVuRKFJHUKpFlHA_R-1YK2ZWrW5I19h6Ps760hugtiCjzm_r7PXP2WgGuJ4Khzqz2GVJYorsM3GQdTqoyyxXMsV59AQcbopdO8N3NEAK2SsTF6z8vvoservf4HhrUNyuzD2pgrd_4_XELx75eAw
linkProvider	EBSCOhost
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+EM+Algorithm+for+Capsule+Regression&rft.jtitle=Neural+computation&rft.au=Saul%2C+Lawrence+K&rft.date=2021-01-01&rft.pub=MIT+Press+Journals%2C+The&rft.issn=0899-7667&rft.eissn=1530-888X&rft.volume=33&rft.issue=1&rft.spage=194&rft_id=info:doi/10.1162%2Fneco_a_01336&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0899-7667&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0899-7667&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0899-7667&client=summon