A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs

Learning from a few examples and generalizing to markedly different situations are capabilities of human visual intelligence that are yet to be matched by leading machine learning models. By drawing inspiration from systems neuroscience, we introduce a probabilistic generative model for vision in wh...

Full description

Saved in:
Bibliographic Details
Published inScience (American Association for the Advancement of Science) Vol. 358; no. 6368
Main Authors George, Dileep, Lehrach, Wolfgang, Kansky, Ken, Lázaro-Gredilla, Miguel, Laan, Christopher, Marthi, Bhaskara, Lou, Xinghua, Meng, Zhaoshi, Liu, Yi, Wang, Huayan, Lavin, Alex, Phoenix, D Scott
Format Journal Article
LanguageEnglish
Published United States 08.12.2017
Online AccessGet more information
ISSN1095-9203
DOI10.1126/science.aag2612

Cover

Abstract Learning from a few examples and generalizing to markedly different situations are capabilities of human visual intelligence that are yet to be matched by leading machine learning models. By drawing inspiration from systems neuroscience, we introduce a probabilistic generative model for vision in which message-passing-based inference handles recognition, segmentation, and reasoning in a unified way. The model demonstrates excellent generalization and occlusion-reasoning capabilities and outperforms deep neural networks on a challenging scene text recognition benchmark while being 300-fold more data efficient. In addition, the model fundamentally breaks the defense of modern text-based CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) by generatively segmenting characters without CAPTCHA-specific heuristics. Our model emphasizes aspects such as data efficiency and compositionality that may be important in the path toward general artificial intelligence.
AbstractList Learning from a few examples and generalizing to markedly different situations are capabilities of human visual intelligence that are yet to be matched by leading machine learning models. By drawing inspiration from systems neuroscience, we introduce a probabilistic generative model for vision in which message-passing-based inference handles recognition, segmentation, and reasoning in a unified way. The model demonstrates excellent generalization and occlusion-reasoning capabilities and outperforms deep neural networks on a challenging scene text recognition benchmark while being 300-fold more data efficient. In addition, the model fundamentally breaks the defense of modern text-based CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) by generatively segmenting characters without CAPTCHA-specific heuristics. Our model emphasizes aspects such as data efficiency and compositionality that may be important in the path toward general artificial intelligence.
Author Laan, Christopher
George, Dileep
Lázaro-Gredilla, Miguel
Kansky, Ken
Marthi, Bhaskara
Liu, Yi
Lou, Xinghua
Lavin, Alex
Wang, Huayan
Lehrach, Wolfgang
Meng, Zhaoshi
Phoenix, D Scott
Author_xml – sequence: 1
  givenname: Dileep
  orcidid: 0000-0002-4948-6297
  surname: George
  fullname: George, Dileep
  email: dileep@vicarious.com, miguel@vicarious.com
  organization: Vicarious AI, 2 Union Square, Union City, CA 94587, USA. dileep@vicarious.com miguel@vicarious.com
– sequence: 2
  givenname: Wolfgang
  orcidid: 0000-0001-8258-2961
  surname: Lehrach
  fullname: Lehrach, Wolfgang
  organization: Vicarious AI, 2 Union Square, Union City, CA 94587, USA
– sequence: 3
  givenname: Ken
  orcidid: 0000-0003-1579-9882
  surname: Kansky
  fullname: Kansky, Ken
  organization: Vicarious AI, 2 Union Square, Union City, CA 94587, USA
– sequence: 4
  givenname: Miguel
  orcidid: 0000-0002-4528-5084
  surname: Lázaro-Gredilla
  fullname: Lázaro-Gredilla, Miguel
  email: dileep@vicarious.com, miguel@vicarious.com
  organization: Vicarious AI, 2 Union Square, Union City, CA 94587, USA. dileep@vicarious.com miguel@vicarious.com
– sequence: 5
  givenname: Christopher
  surname: Laan
  fullname: Laan, Christopher
  organization: Vicarious AI, 2 Union Square, Union City, CA 94587, USA
– sequence: 6
  givenname: Bhaskara
  surname: Marthi
  fullname: Marthi, Bhaskara
  organization: Vicarious AI, 2 Union Square, Union City, CA 94587, USA
– sequence: 7
  givenname: Xinghua
  surname: Lou
  fullname: Lou, Xinghua
  organization: Vicarious AI, 2 Union Square, Union City, CA 94587, USA
– sequence: 8
  givenname: Zhaoshi
  surname: Meng
  fullname: Meng, Zhaoshi
  organization: Vicarious AI, 2 Union Square, Union City, CA 94587, USA
– sequence: 9
  givenname: Yi
  orcidid: 0000-0003-2745-6940
  surname: Liu
  fullname: Liu, Yi
  organization: Vicarious AI, 2 Union Square, Union City, CA 94587, USA
– sequence: 10
  givenname: Huayan
  surname: Wang
  fullname: Wang, Huayan
  organization: Vicarious AI, 2 Union Square, Union City, CA 94587, USA
– sequence: 11
  givenname: Alex
  orcidid: 0000-0003-3422-7820
  surname: Lavin
  fullname: Lavin, Alex
  organization: Vicarious AI, 2 Union Square, Union City, CA 94587, USA
– sequence: 12
  givenname: D Scott
  surname: Phoenix
  fullname: Phoenix, D Scott
  organization: Vicarious AI, 2 Union Square, Union City, CA 94587, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/29074582$$D View this record in MEDLINE/PubMed
BookMark eNo1j8tOwzAURC0Eog9Ys0P3B1L8SJx6GVVQkCrBArZUN_F1YmjdKjaF_j0Rj9VoZqQZnQk7DbtAjF0JPhNC6pvYeAoNzRBbqYU8YWPBTZEZydWITWJ843zwRp2zkTS8zIu5HLPXCloK1GPyB4KDj34XYLuztIHUYYLUow8RPn3qoPNtBxYTAjnnf96OgMFC3RO-R0j0lbIaI1lYVE_Pi_sqXrAzh5tIl386ZS93t0OTrR6XD4tqlTUqL1OmTNGYgiuN2iHZUhgha61FIQo9FzkZMzc5KYuo0MocXVM7zrUcgHgth2jKrn939x_1lux63_st9sf1P6j8Bih9VjM
CitedBy_id crossref_primary_10_1007_s10462_021_10130_z
crossref_primary_10_1016_j_neucom_2019_04_055
crossref_primary_10_1145_3231589
crossref_primary_10_1038_s41467_021_22559_5
crossref_primary_10_1007_s11704_020_9001_8
crossref_primary_10_2197_ipsjjip_26_625
crossref_primary_10_1007_s12559_023_10181_0
crossref_primary_10_1016_j_patcog_2020_107482
crossref_primary_10_1145_3378446
crossref_primary_10_1016_j_neucom_2024_128701
crossref_primary_10_1038_s41467_018_07289_5
crossref_primary_10_1109_ACCESS_2020_2982945
crossref_primary_10_1021_acs_chemrev_8b00719
crossref_primary_10_1016_j_neucom_2018_03_081
crossref_primary_10_1007_s11263_020_01401_3
crossref_primary_10_1088_1361_6463_ab8036
crossref_primary_10_5937_kultura2484151G
crossref_primary_10_1126_sciadv_adr6698
crossref_primary_10_1162_neco_a_01446
crossref_primary_10_1073_pnas_2221122120
crossref_primary_10_1109_TIFS_2018_2821096
crossref_primary_10_3390_app12084059
crossref_primary_10_1145_3505226
crossref_primary_10_1016_j_conb_2019_02_011
crossref_primary_10_1016_j_neunet_2024_106821
crossref_primary_10_1371_journal_pone_0230432
crossref_primary_10_7717_peerj_cs_879
crossref_primary_10_1017_S0140525X23002777
crossref_primary_10_1016_j_neucom_2019_08_109
crossref_primary_10_1109_TNNLS_2020_3005574
crossref_primary_10_1038_s41467_024_46976_4
crossref_primary_10_1109_TMM_2020_3013376
crossref_primary_10_1002_cpe_5585
crossref_primary_10_1016_j_chemolab_2020_104070
crossref_primary_10_1145_3559754
crossref_primary_10_2174_1574887115666200621183459
crossref_primary_10_1016_j_autcon_2020_103535
crossref_primary_10_1038_s41467_023_36346_x
crossref_primary_10_3390_app9112331
crossref_primary_10_1007_s10489_019_01607_0
crossref_primary_10_1002_asi_24334
crossref_primary_10_2478_popets_2021_0055
crossref_primary_10_1007_s00521_020_05345_0
crossref_primary_10_20965_jaciii_2021_p0450
crossref_primary_10_1007_s11042_021_11485_9
crossref_primary_10_1073_pnas_1905544116
crossref_primary_10_1126_scirobotics_aav3150
crossref_primary_10_3389_fncom_2020_00039
crossref_primary_10_1007_s11263_020_01405_z
crossref_primary_10_1007_s10462_018_9631_5
crossref_primary_10_1109_ACCESS_2024_3442976
crossref_primary_10_3389_frai_2023_974295
crossref_primary_10_3390_math8030332
crossref_primary_10_1371_journal_pcbi_1010269
crossref_primary_10_1609_aimag_v40i3_4810
crossref_primary_10_1109_TIP_2019_2944560
crossref_primary_10_1002_cpe_4769
crossref_primary_10_1016_j_neucom_2020_11_057
crossref_primary_10_1016_j_patcog_2018_07_027
crossref_primary_10_1152_jn_00160_2021
crossref_primary_10_1109_ACCESS_2019_2956508
crossref_primary_10_1109_TDSC_2023_3238408
crossref_primary_10_1007_s00500_023_09225_2
crossref_primary_10_1016_j_cose_2021_102178
crossref_primary_10_1016_j_petsci_2023_02_017
crossref_primary_10_1016_j_neucom_2024_129319
crossref_primary_10_1111_jan_15981
crossref_primary_10_1016_j_cub_2024_10_074
crossref_primary_10_1016_j_chemolab_2020_104123
crossref_primary_10_5188_ijsmer_25_212
crossref_primary_10_3756_artsci_17_52
crossref_primary_10_1016_j_neucom_2019_04_088
crossref_primary_10_1360_TB_2022_1266
crossref_primary_10_1016_j_conb_2019_01_010
crossref_primary_10_1016_j_jisa_2022_103318
crossref_primary_10_1080_0952813X_2022_2078889
crossref_primary_10_1093_pnasnexus_pgad337
crossref_primary_10_3390_app13074602
crossref_primary_10_3389_fncom_2020_554097
crossref_primary_10_1049_cvi2_12340
crossref_primary_10_1002_ail2_43
crossref_primary_10_1038_s41746_019_0148_3
crossref_primary_10_1097_APO_0000000000000301
crossref_primary_10_3390_jimaging7060092
crossref_primary_10_1016_j_conb_2020_11_009
crossref_primary_10_1126_sciadv_aax5979
crossref_primary_10_1007_s40747_024_01661_3
crossref_primary_10_1162_neco_a_01510
crossref_primary_10_1109_JIOT_2020_3040441
crossref_primary_10_1109_ACCESS_2020_3030076
crossref_primary_10_1109_TNNLS_2018_2886008
crossref_primary_10_1016_j_compchemeng_2020_107068
crossref_primary_10_3103_S0147688221060113
crossref_primary_10_1016_j_neucom_2020_09_075
crossref_primary_10_1016_j_patcog_2020_107305
crossref_primary_10_1146_annurev_vision_091718_014951
crossref_primary_10_1016_j_compeleceng_2021_107593
crossref_primary_10_1016_j_nic_2020_06_002
crossref_primary_10_1049_ise2_12047
crossref_primary_10_1016_j_jacr_2023_12_009
crossref_primary_10_1016_j_neucom_2022_04_130
crossref_primary_10_1523_ENEURO_0443_17_2018
crossref_primary_10_1073_pnas_2023123118
crossref_primary_10_3390_min13091187
crossref_primary_10_3390_su16031166
crossref_primary_10_1016_j_survophthal_2022_08_005
crossref_primary_10_1063_1_5129306
crossref_primary_10_1017_S0140525X22002813
crossref_primary_10_1162_neco_a_01627
crossref_primary_10_1007_s11263_022_01666_w
crossref_primary_10_1016_j_futures_2018_11_007
crossref_primary_10_1109_TIFS_2019_2928622
crossref_primary_10_3389_fnint_2023_1108271
crossref_primary_10_7717_peerj_cs_613
crossref_primary_10_1167_jov_23_5_16
crossref_primary_10_3390_ijms24032266
crossref_primary_10_1109_MCE_2019_2936631
crossref_primary_10_1016_j_neubiorev_2020_07_005
crossref_primary_10_1109_TVCG_2022_3147154
ContentType Journal Article
Copyright Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Copyright_xml – notice: Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
DBID NPM
DOI 10.1126/science.aag2612
DatabaseName PubMed
DatabaseTitle PubMed
DatabaseTitleList PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Sciences (General)
Biology
EISSN 1095-9203
ExternalDocumentID 29074582
Genre Journal Article
GroupedDBID ---
--Z
-DZ
-ET
-~X
.-4
..I
.55
.DC
08G
0B8
0R~
0WA
123
18M
2FS
2KS
2WC
34G
36B
39C
3R3
53G
5RE
66.
6OB
6TJ
7X2
7~K
85S
8F7
AABCJ
AACGO
AAIKC
AAMNW
AANCE
AAWTO
ABBHK
ABCQX
ABDBF
ABDEX
ABDQB
ABEFU
ABIVO
ABJNI
ABOCM
ABPLY
ABPPZ
ABQIJ
ABTLG
ABWJO
ABXSQ
ABZEH
ACBEA
ACBEC
ACGFO
ACGFS
ACGOD
ACHIC
ACIWK
ACMJI
ACNCT
ACPRK
ACQOY
ACUHS
ADDRP
ADMHC
ADUKH
AEGBM
AENEX
AEUPB
AEXZC
AFFDN
AFFNX
AFHKK
AFQFN
AFRAH
AGFXO
AGNAY
AGSOS
AHMBA
AIDAL
AIDUJ
AJGZS
ALIPV
ALMA_UNASSIGNED_HOLDINGS
AQVQM
ASPBG
AVWKF
BKF
BLC
C45
C51
CS3
DB2
DCCCD
DU5
EBS
EJD
EMOBN
ESX
F5P
FA8
FEDTE
GX1
HZ~
I.T
IAO
IEA
IGG
IGS
IH2
IHR
INH
INR
IOF
IOV
IPO
IPSME
IPY
ISE
JAAYA
JBMMH
JCF
JENOY
JHFFW
JKQEH
JLS
JLXEF
JPM
JSG
JST
K-O
KCC
L7B
LSO
LU7
M0P
MQT
MVM
N9A
NEJ
NHB
NPM
O9-
OCB
OFXIZ
OGEVE
OK1
OMK
OVD
P-O
P2P
PKN
PQQKQ
PZZ
QS-
RHF
RHI
RXW
SA0
SC5
SJN
TAE
TEORI
TN5
TWZ
UBW
UCV
UHB
UIG
UKR
UMD
UNMZH
UQL
USG
VQA
VVN
WH7
WI4
X7M
XJF
XZL
Y6R
YCJ
YIF
YIN
YK4
YKV
YNT
YOJ
YR2
YR5
YRY
YSQ
YV5
YWH
YYP
YYQ
YZZ
ZCA
ZE2
~02
~G0
~KM
~ZZ
ID FETCH-LOGICAL-c347t-395c95036a6faed71912b6615156814e99894e3daa3ad24afcbf00620090b2ad2
IngestDate Wed Feb 19 02:43:42 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 6368
Language English
License Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c347t-395c95036a6faed71912b6615156814e99894e3daa3ad24afcbf00620090b2ad2
ORCID 0000-0002-4948-6297
0000-0002-4528-5084
0000-0003-2745-6940
0000-0003-3422-7820
0000-0003-1579-9882
0000-0001-8258-2961
PMID 29074582
ParticipantIDs pubmed_primary_29074582
PublicationCentury 2000
PublicationDate 2017-Dec-08
PublicationDateYYYYMMDD 2017-12-08
PublicationDate_xml – month: 12
  year: 2017
  text: 2017-Dec-08
  day: 08
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Science (American Association for the Advancement of Science)
PublicationTitleAlternate Science
PublicationYear 2017
SSID ssj0009593
Score 2.6479206
Snippet Learning from a few examples and generalizing to markedly different situations are capabilities of human visual intelligence that are yet to be matched by...
SourceID pubmed
SourceType Index Database
Title A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs
URI https://www.ncbi.nlm.nih.gov/pubmed/29074582
Volume 358
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT9wwELYWUKVeEAuF8pQPHKhWQYmTOMkxWkERL3EAlVORnThL1bK72l0O5U_1LzKTcTYWLRL0EkV2EkWeLzPjyXwzjO0XgZJZpIwnywA2KMJoD0lpXin8KqwybUSKfOeLS3lyE53exredzh8na-lxpg-Lp3_ySv5HqjAGckWW7DskO38oDMA5yBeOIGE4vknGOTZANrZ2N7HEqbUNuJNIUsT2D5a-hmWJe5gOihkcP-oPmiovwZZY_Zz2MAHEQ5NW9vr51XX_JJ-6fmujAsAfnf_jcSQ7T1bMKaWgyTCwtznhBorB15oO9JEZtwlB9xNFbam-jX5VA2UtKtoCsKYU4T1reWvn9R_-4ElNRt7XCVhggDPRAAaPlgVgYxlgHzEvhNSvIf3rY-tI4Yeugg7j1EGiDKkPz9-63-lWaQ6VGmB1NPdKEN74oYaCwJBAnL5h9kUx7mZqgS0kCXYKucTgkFPj2VaPcnhZzZtg2Wl794stTO3KXK-wZbsH4TkBqss6ZrjKPlBX0t-rrGulNuUHtij5lzX2Pect1jhhjddY44g1TljjiDWOWOOINd5ijQPWOGGNt1jjDdY-sZvjIzjzbG8OrwijZOaFWVxkMbg_SlbKlAls-4WWtXss0yAyGRb2N2GpVKhKEamq0BXydWGVfC1gaJ0tDkdD85lx7VdSY15w4KeRLlOdGAmOLDj-aRwlQm6yDVqtuzEVYLlr1nHr1Zlt9rHF1w5bquCLN7vgPs70Xi20Z5UDcHg
linkProvider National Library of Medicine
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+generative+vision+model+that+trains+with+high+data+efficiency+and+breaks+text-based+CAPTCHAs&rft.jtitle=Science+%28American+Association+for+the+Advancement+of+Science%29&rft.au=George%2C+Dileep&rft.au=Lehrach%2C+Wolfgang&rft.au=Kansky%2C+Ken&rft.au=L%C3%A1zaro-Gredilla%2C+Miguel&rft.date=2017-12-08&rft.eissn=1095-9203&rft.volume=358&rft.issue=6368&rft_id=info:doi/10.1126%2Fscience.aag2612&rft_id=info%3Apmid%2F29074582&rft_id=info%3Apmid%2F29074582&rft.externalDocID=29074582