An EM Algorithm for Capsule Regression

We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, ). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use th...

Full description

Saved in:
Bibliographic Details
Published inNeural computation Vol. 33; no. 1; pp. 194 - 226
Main Author Saul, Lawrence K.
Format Journal Article
LanguageEnglish
Published One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 01.01.2021
MIT Press Journals, The
Subjects
Online AccessGet full text
ISSN0899-7667
1530-888X
1530-888X
DOI10.1162/neco_a_01336

Cover

Abstract We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, ). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression—a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice.
AbstractList We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, 2017). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression—a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice.
We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, ). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression—a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice.
We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, 2017). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression-a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice.We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, 2017). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression-a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice.
Author Saul, Lawrence K.
Author_xml – sequence: 1
  givenname: Lawrence K.
  surname: Saul
  fullname: Saul, Lawrence K.
  email: saul@cs.ucsd.edu
  organization: Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093-0404 saul@cs.ucsd.edu
BackLink https://www.ncbi.nlm.nih.gov/pubmed/33080167$$D View this record in MEDLINE/PubMed
BookMark eNpt0E1LxDAQBuAgivuhN89SEMSD1UmTNMlxWdYPWBFEwVto0-napW3WZHvw31vZVReRgZnLw8zwjsh-61ok5ITCFaVpct2idSYzQBlL98iQCgaxUup1nwxBaR3LNJUDMgphCQApBXFIBoyBAprKITmftNHsIZrUC-er9VsTlc5H02wVuhqjJ1x4DKFy7RE5KLM64PF2jsnLzex5ehfPH2_vp5N5bBnodSw1UEVTXuQ0tSgYVagzxq2S0DeUDLW2GSpRAucF5CVHa3kuRJFzhWXCxuRis3fl3XuHYW2aKlis66xF1wWTcJFo2Zfo6dkfunSdb_vvTKK0kJxR4L063aoub7AwK181mf8w3wn04HIDrHcheCx_CAXzFbDZDfj3wabaOfgv_QSVAHiO
Cites_doi 10.7551/mitpress/7496.003.0015
10.1093/biomet/85.4.755
10.1080/10618600.2012.672115
10.1016/S0893-6080(98)00116-6
10.7551/mitpress/5236.001.0001
10.1145/130385.130401
10.1007/BF00994018
10.1007/978-3-642-21735-7_6
10.1093/biomet/81.4.633
10.1007/BF02293851
10.1111/j.1467-9469.2007.00585.x
10.1109/5.726791
10.1023/A:1010933404324
10.1007/978-94-011-5014-9_12
10.2307/2290716
10.1111/1467-9868.00083
10.1162/neco.1992.4.4.473
10.1111/j.2517-6161.1977.tb01600.x
ContentType Journal Article
Copyright Copyright MIT Press Journals, The 2021
Copyright_xml – notice: Copyright MIT Press Journals, The 2021
DBID AAYXX
CITATION
NPM
7SC
8FD
JQ2
L7M
L~C
L~D
7X8
DOI 10.1162/neco_a_01336
DatabaseName CrossRef
PubMed
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList Computer and Information Systems Abstracts

CrossRef
PubMed
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1530-888X
EndPage 226
ExternalDocumentID 33080167
10_1162_neco_a_01336
neco_a_01336.pdf
Genre Journal Article
Correspondence
GroupedDBID ---
-~X
.4S
.DC
0R~
123
36B
4.4
6IK
AAJGR
AALMD
ABDBF
ABDNZ
ABIVO
ABJNI
ACGFO
AEGXH
AENEX
AFHIN
AIAGR
ALMA_UNASSIGNED_HOLDINGS
ARCSS
AVWKF
AZFZN
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EAP
EAS
EBC
EBD
EBS
ECS
EDO
EMB
EMK
EMOBN
EPL
EPS
EST
ESX
F5P
FEDTE
FNEHJ
HZ~
I-F
IPLJI
JAVBF
MCG
MINIK
MKJ
O9-
OCL
P2P
PK0
PQQKQ
RMI
SV3
TUS
WG8
WH7
XJE
ZWS
41~
53G
AAFWJ
AAYXX
ABAZT
ABEFU
ABVLG
ACUHS
ACYGS
ADIYS
ADMLS
AMVHM
CAG
CITATION
COF
EJD
HVGLF
H~9
AAYOK
AEILP
NPM
7SC
8FD
JQ2
L7M
L~C
L~D
7X8
ID FETCH-LOGICAL-c309t-79018164db16ce5318e9a34c8704c8e73e99cae85f044d0bf4ecc4b55db48ef23
ISSN 0899-7667
1530-888X
IngestDate Fri Sep 05 11:25:40 EDT 2025
Mon Jun 30 14:02:05 EDT 2025
Thu Apr 03 06:53:08 EDT 2025
Wed Oct 01 02:03:11 EDT 2025
Thu Mar 28 07:29:37 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c309t-79018164db16ce5318e9a34c8704c8e73e99cae85f044d0bf4ecc4b55db48ef23
Notes January, 2021
SourceType-Scholarly Journals-1
ObjectType-Correspondence-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
PMID 33080167
PQID 2895743104
PQPubID 37252
PageCount 33
ParticipantIDs proquest_miscellaneous_2452979795
crossref_primary_10_1162_neco_a_01336
pubmed_primary_33080167
mit_journals_10_1162_neco_a_01336
proquest_journals_2895743104
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2021-01-01
2021-01-00
20210101
PublicationDateYYYYMMDD 2021-01-01
PublicationDate_xml – month: 01
  year: 2021
  text: 2021-01-01
  day: 01
PublicationDecade 2020
PublicationPlace One Rogers Street, Cambridge, MA 02142-1209, USA
PublicationPlace_xml – name: One Rogers Street, Cambridge, MA 02142-1209, USA
– name: United States
– name: Cambridge
PublicationTitle Neural computation
PublicationTitleAlternate Neural Comput
PublicationYear 2021
Publisher MIT Press
MIT Press Journals, The
Publisher_xml – name: MIT Press
– name: MIT Press Journals, The
References B20
B21
B22
Wainwright M. J. (B38) 2008; 1
Ahmed K. (B1) 2019; 32
Bahadori M. T. (B2) 2018
Hill C. (B11) 2016
Kosiorek A. (B18) 2019; 32
Crammer K. (B6) 2006; 7
Qin Y. (B29) 2020
B26
B27
B28
Wang D. (B39) 2018
Hinton G. E. (B13) 2018
Lange K. L. (B19) 1995; 5
Lloyd S. P. (B23) 1957
Salakhutdinov R. R. (B33) 2003
Xiao H. (B40) 2017
Duarte K. (B8) 2018; 31
Loosli G. (B24) 2007
Hahn T. (B10) 2019; 32
Ghahramani Z. (B9) 1996
Sabour S. (B32) 2017; 30
B30
Rumelhart D. E. (B31) 1986
Tsai Y.-H. (B35) 2020
B12
B14
B36
B15
Neal R. M. (B25) 1993
Zhang L. (B42) 2018; 31
B3
Jordan M. I. (B17) 2018
B4
Jeong T. (B16) 2019
B5
Dempster A. P. (B7) 1977; 39
Srivastava N. (B34) 2014; 15
Venkataraman S. R. (B37) 2020
B41
References_xml – volume-title: Learning scientific programming with Python
  year: 2016
  ident: B11
– start-page: 301
  volume-title: Large scale kernel machines
  year: 2007
  ident: B24
  doi: 10.7551/mitpress/7496.003.0015
– volume: 31
  start-page: 5814
  volume-title: Advances in neural information processing systems
  year: 2018
  ident: B42
– volume: 32
  start-page: 15512
  volume-title: Advances in neural information processing systems
  year: 2019
  ident: B18
– ident: B22
  doi: 10.1093/biomet/85.4.755
– volume: 30
  start-page: 3856
  volume-title: Advances in neural information processing systems
  year: 2017
  ident: B32
– volume-title: Proceedings of the International Conference on Learning Representations
  year: 2020
  ident: B37
– year: 1996
  ident: B9
  publication-title: The EM algorithm for mixtures of factor analyzers
– year: 2017
  ident: B40
  publication-title: Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms.
– ident: B41
  doi: 10.1080/10618600.2012.672115
– ident: B28
  doi: 10.1016/S0893-6080(98)00116-6
– volume-title: Parallel distributed processing
  year: 1986
  ident: B31
  doi: 10.7551/mitpress/5236.001.0001
– volume: 1
  start-page: 1
  issue: 1
  year: 2008
  ident: B38
  publication-title: Foundations and Trends in Machine Learning
– ident: B3
  doi: 10.1145/130385.130401
– ident: B5
  doi: 10.1007/BF00994018
– volume: 7
  start-page: 551
  year: 2006
  ident: B6
  publication-title: Journal of Machine Learning Research
– ident: B12
  doi: 10.1007/978-3-642-21735-7_6
– year: 1993
  ident: B25
  publication-title: Probabilistic inference using Markov chain Monte Carlo methods
– start-page: 672
  volume-title: Proceedings of the 20th International Conference on Machine Learning
  year: 2003
  ident: B33
– start-page: 3071
  volume-title: Proceedings of the 36th International Conference on Machine Learning
  year: 2019
  ident: B16
– ident: B21
  doi: 10.1093/biomet/81.4.633
– year: 1957
  ident: B23
  publication-title: Least squares quantization in PCM
– ident: B30
  doi: 10.1007/BF02293851
– volume: 15
  start-page: 1929
  year: 2014
  ident: B34
  publication-title: Journal of Machine Learning Research
– ident: B36
  doi: 10.1111/j.1467-9469.2007.00585.x
– volume: 32
  start-page: 9101
  volume-title: Advances in neural information processing systems
  year: 2019
  ident: B1
– volume-title: Proceedings of the International Conference on Learning Representations
  year: 2020
  ident: B35
– year: 2018
  ident: B39
  publication-title: An optimization view on dynamic routing between capsules
– ident: B20
  doi: 10.1109/5.726791
– year: 2018
  ident: B2
  publication-title: Spectral capsule networks
– ident: B4
  doi: 10.1023/A:1010933404324
– volume-title: Proceedings of the International Conference on Learning Representations
  year: 2018
  ident: B13
– ident: B26
  doi: 10.1007/978-94-011-5014-9_12
– volume: 32
  start-page: 7658
  volume-title: Advances in neural information processing systems
  year: 2019
  ident: B10
– ident: B14
  doi: 10.2307/2290716
– volume-title: Proceedings of the International Conference on Learning Representations
  year: 2020
  ident: B29
– volume: 31
  start-page: 7610
  volume-title: Advances in neural information processing systems
  year: 2018
  ident: B8
– ident: B15
  doi: 10.1111/1467-9868.00083
– volume: 5
  start-page: 1
  year: 1995
  ident: B19
  publication-title: Statistica Sinica
– ident: B27
  doi: 10.1162/neco.1992.4.4.473
– volume: 39
  start-page: 1
  issue: 1
  year: 1977
  ident: B7
  publication-title: Journal of the Royal Statistical Society B
  doi: 10.1111/j.2517-6161.1977.tb01600.x
– year: 2018
  ident: B17
  publication-title: Artificial intelligence—the revolution that hasn't happened yet.
SSID ssj0006105
Score 2.3446574
Snippet We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, &...
SourceID proquest
pubmed
crossref
mit
SourceType Aggregation Database
Index Database
Publisher
StartPage 194
SubjectTerms Algorithms
Classification
Matrices (mathematics)
Multilayers
Nonlinearity
Object recognition
Regression
Title An EM Algorithm for Capsule Regression
URI https://direct.mit.edu/neco/article/doi/10.1162/neco_a_01336
https://www.ncbi.nlm.nih.gov/pubmed/33080167
https://www.proquest.com/docview/2895743104
https://www.proquest.com/docview/2452979795
Volume 33
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: Academic Search Ultimate (EBSCO)
  customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn
  eissn: 1530-888X
  dateEnd: 20241101
  omitProxy: true
  ssIdentifier: ssj0006105
  issn: 0899-7667
  databaseCode: ABDBF
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn
  providerName: EBSCOhost
– providerCode: PRVEBS
  databaseName: Inspec with Full Text
  customDbUrl:
  eissn: 1530-888X
  dateEnd: 20241101
  omitProxy: false
  ssIdentifier: ssj0006105
  issn: 0899-7667
  databaseCode: ADMLS
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text
  providerName: EBSCOhost
– providerCode: PRVEBS
  databaseName: Mathematics Source
  customDbUrl:
  eissn: 1530-888X
  dateEnd: 20241101
  omitProxy: false
  ssIdentifier: ssj0006105
  issn: 0899-7667
  databaseCode: AMVHM
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/mathematics-source
  providerName: EBSCOhost
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLZY98ILjHvZhoIEe5k8kvgWP0aj0wTteFgr9S2yE2cgdek2UiHx63ccO2kKGwIayaoSy43O5345J-eG0DsZG_vRmJnIYCoSgnVOIlwwocqiUFowm5w8OeOnM_ppzubrVpBNdkmtj_Kfd-aV_A-qcA5wtVmy_4BstyicgO-AL4yAMIx_hXFaHY4mh-niYgkm_tfLJmTwWIHdu7DdGC5ciGvV1z9tLY6mIojt5bDhhD9Xq4VLlP7hCs9-7r8QiKNfXggA5bnwjdah7cONTJ9ZpMSCuz4YR6ZlvhCDOTzvU6OrUbGxBRzPRa4z8e_8y5t6rmA5ZyoD7ZLcUeb67Et2MhuPs-loPj24usa2A5j1lPt2KFtoOwaGDgdoO_04GZ93z1XuA1Lbm2_TGHj8of-DGwrG1uW3-n7bodEhpjvokZdUkDokn6AHpnqKHreNNQLPs8_QQVoFo0nQARsAsIEHNlgD-xzNTkbT41PsG1rgnISyxkLa8micFjriuQH2S4xUhObAmTAYQYyUuTIJK0NKi1CXFP5gVDNWaJqYMiYv0KBaVuYVCkohZcllCRMIVUpIVuRKFJHUKpFlHA_R-1YK2ZWrW5I19h6Ps760hugtiCjzm_r7PXP2WgGuJ4Khzqz2GVJYorsM3GQdTqoyyxXMsV59AQcbopdO8N3NEAK2SsTF6z8vvoservf4HhrUNyuzD2pgrd_4_XELx75eAw
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+EM+Algorithm+for+Capsule+Regression&rft.jtitle=Neural+computation&rft.au=Saul%2C+Lawrence+K&rft.date=2021-01-01&rft.pub=MIT+Press+Journals%2C+The&rft.issn=0899-7667&rft.eissn=1530-888X&rft.volume=33&rft.issue=1&rft.spage=194&rft_id=info:doi/10.1162%2Fneco_a_01336&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0899-7667&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0899-7667&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0899-7667&client=summon