Decoding Visual Neural Representations by Multimodal Learning of Brain-Visual-Linguistic Features
Decoding human visual neural representations is a challenging task with great scientific significance in revealing vision-processing mechanisms and developing brain-like intelligent machines. Most existing methods are difficult to generalize to novel categories that have no corresponding neural data...
        Saved in:
      
    
          | Published in | IEEE transactions on pattern analysis and machine intelligence Vol. 45; no. 9; pp. 10760 - 10777 | 
|---|---|
| Main Authors | , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        United States
          IEEE
    
        01.09.2023
     The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 0162-8828 1939-3539 2160-9292 1939-3539  | 
| DOI | 10.1109/TPAMI.2023.3263181 | 
Cover
| Abstract | Decoding human visual neural representations is a challenging task with great scientific significance in revealing vision-processing mechanisms and developing brain-like intelligent machines. Most existing methods are difficult to generalize to novel categories that have no corresponding neural data for training. The two main reasons are 1) the under-exploitation of the multimodal semantic knowledge underlying the neural data and 2) the small number of paired ( stimuli-responses ) training data. To overcome these limitations, this paper presents a generic neural decoding method called BraVL that uses multimodal learning of brain-visual-linguistic features. We focus on modeling the relationships between brain, visual and linguistic features via multimodal deep generative models. Specifically, we leverage the mixture-of-product-of-experts formulation to infer a latent code that enables a coherent joint generation of all three modalities. To learn a more consistent joint representation and improve the data efficiency in the case of limited brain activity data, we exploit both intra- and inter-modality mutual information maximization regularization terms. In particular, our BraVL model can be trained under various semi-supervised scenarios to incorporate the visual and textual features obtained from the extra categories. Finally, we construct three trimodal matching datasets, and the extensive experiments lead to some interesting conclusions and cognitive insights: 1) decoding novel visual categories from human brain activity is practically possible with good accuracy; 2) decoding models using the combination of visual and linguistic features perform much better than those using either of them alone; 3) visual perception may be accompanied by linguistic influences to represent the semantics of visual stimuli. | 
    
|---|---|
| AbstractList | Decoding human visual neural representations is a challenging task with great scientific significance in revealing vision-processing mechanisms and developing brain-like intelligent machines. Most existing methods are difficult to generalize to novel categories that have no corresponding neural data for training. The two main reasons are 1) the under-exploitation of the multimodal semantic knowledge underlying the neural data and 2) the small number of paired (stimuli-responses) training data. To overcome these limitations, this paper presents a generic neural decoding method called BraVL that uses multimodal learning of brain-visual-linguistic features. We focus on modeling the relationships between brain, visual and linguistic features via multimodal deep generative models. Specifically, we leverage the mixture-of-product-of-experts formulation to infer a latent code that enables a coherent joint generation of all three modalities. To learn a more consistent joint representation and improve the data efficiency in the case of limited brain activity data, we exploit both intra- and inter-modality mutual information maximization regularization terms. In particular, our BraVL model can be trained under various semi-supervised scenarios to incorporate the visual and textual features obtained from the extra categories. Finally, we construct three trimodal matching datasets, and the extensive experiments lead to some interesting conclusions and cognitive insights: 1) decoding novel visual categories from human brain activity is practically possible with good accuracy; 2) decoding models using the combination of visual and linguistic features perform much better than those using either of them alone; 3) visual perception may be accompanied by linguistic influences to represent the semantics of visual stimuli.Decoding human visual neural representations is a challenging task with great scientific significance in revealing vision-processing mechanisms and developing brain-like intelligent machines. Most existing methods are difficult to generalize to novel categories that have no corresponding neural data for training. The two main reasons are 1) the under-exploitation of the multimodal semantic knowledge underlying the neural data and 2) the small number of paired (stimuli-responses) training data. To overcome these limitations, this paper presents a generic neural decoding method called BraVL that uses multimodal learning of brain-visual-linguistic features. We focus on modeling the relationships between brain, visual and linguistic features via multimodal deep generative models. Specifically, we leverage the mixture-of-product-of-experts formulation to infer a latent code that enables a coherent joint generation of all three modalities. To learn a more consistent joint representation and improve the data efficiency in the case of limited brain activity data, we exploit both intra- and inter-modality mutual information maximization regularization terms. In particular, our BraVL model can be trained under various semi-supervised scenarios to incorporate the visual and textual features obtained from the extra categories. Finally, we construct three trimodal matching datasets, and the extensive experiments lead to some interesting conclusions and cognitive insights: 1) decoding novel visual categories from human brain activity is practically possible with good accuracy; 2) decoding models using the combination of visual and linguistic features perform much better than those using either of them alone; 3) visual perception may be accompanied by linguistic influences to represent the semantics of visual stimuli. Decoding human visual neural representations is a challenging task with great scientific significance in revealing vision-processing mechanisms and developing brain-like intelligent machines. Most existing methods are difficult to generalize to novel categories that have no corresponding neural data for training. The two main reasons are 1) the under-exploitation of the multimodal semantic knowledge underlying the neural data and 2) the small number of paired ( stimuli-responses ) training data. To overcome these limitations, this paper presents a generic neural decoding method called BraVL that uses multimodal learning of brain-visual-linguistic features. We focus on modeling the relationships between brain, visual and linguistic features via multimodal deep generative models. Specifically, we leverage the mixture-of-product-of-experts formulation to infer a latent code that enables a coherent joint generation of all three modalities. To learn a more consistent joint representation and improve the data efficiency in the case of limited brain activity data, we exploit both intra- and inter-modality mutual information maximization regularization terms. In particular, our BraVL model can be trained under various semi-supervised scenarios to incorporate the visual and textual features obtained from the extra categories. Finally, we construct three trimodal matching datasets, and the extensive experiments lead to some interesting conclusions and cognitive insights: 1) decoding novel visual categories from human brain activity is practically possible with good accuracy; 2) decoding models using the combination of visual and linguistic features perform much better than those using either of them alone; 3) visual perception may be accompanied by linguistic influences to represent the semantics of visual stimuli.  | 
    
| Author | Fu, Kaicheng He, Huiguang Li, Jinpeng Du, Changde  | 
    
| Author_xml | – sequence: 1 givenname: Changde orcidid: 0000-0002-0084-433X surname: Du fullname: Du, Changde email: changde.du@ia.ac.cn organization: Research Center for Brain-Inspired Intelligence, State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China – sequence: 2 givenname: Kaicheng surname: Fu fullname: Fu, Kaicheng email: fukaicheng2019@ia.ac.cn organization: Research Center for Brain-Inspired Intelligence, State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China – sequence: 3 givenname: Jinpeng orcidid: 0000-0001-8701-2642 surname: Li fullname: Li, Jinpeng email: lijinpeng@ucas.ac.cn organization: Ningbo HwaMei Hospital, UCAS, Zhejiang, China – sequence: 4 givenname: Huiguang orcidid: 0000-0002-0684-1711 surname: He fullname: He, Huiguang email: huiguang.he@ia.ac.cn organization: Research Center for Brain-Inspired Intelligence, State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China  | 
    
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/37030711$$D View this record in MEDLINE/PubMed | 
    
| BookMark | eNp9kU9rVDEUxYNU7LT6BUTkgRs3b8xN3p9kWavVwlRFqtuQSW4k5U0yJi-LfnszvilIF64u3JzfyeWcM3ISYkBCXgJdA1D57vbbxc31mlHG15wNHAQ8ISsGA20lk-yErCgMrBWCiVNylvMdpdD1lD8jp3yknI4AK6I_oInWh1_NT5-LnpovWFId33GfMGOY9exjyM32vrkp0-x30dbXDeoUDlB0zfukfWgXut3UZfF59qa5Qj2X6vGcPHV6yvjiOM_Jj6uPt5ef283XT9eXF5vWdCDm1vWsA74VjImOu0Fo0zldTxfOOtlTA9oxZ6HvuWMgRkGFscZyi8xK0GLg5-Tt4rtP8XfBPKudzwanSQeMJSs2SjFCR0depW8eSe9iSaFep-rvo-zHgdGqen1Ule0Ordonv9PpXj2EVwViEZgUc07olPFLXnPNZFJA1aEn9bcndehJHXuqKHuEPrj_F3q1QB4R_wGokCAp_wML3p26 | 
    
| CODEN | ITPIDJ | 
    
| CitedBy_id | crossref_primary_10_1126_sciadv_adm8430 crossref_primary_10_1109_TIM_2024_3480232 crossref_primary_10_1111_cns_14615 crossref_primary_10_1016_j_ipm_2024_103772 crossref_primary_10_1002_hbm_26500 crossref_primary_10_1142_S0129065725500091 crossref_primary_10_3390_brainsci14050478 crossref_primary_10_1038_s41598_024_63651_2 crossref_primary_10_1016_j_inffus_2025_103021 crossref_primary_10_1109_TCDS_2024_3370261 crossref_primary_10_1177_08953996241296400 crossref_primary_10_1007_s10489_023_05165_4 crossref_primary_10_1016_j_cmpb_2024_108213 crossref_primary_10_1007_s12559_024_10329_6 crossref_primary_10_1016_j_nlp_2023_100026 crossref_primary_10_1016_j_sna_2025_116354 crossref_primary_10_3390_s23156903  | 
    
| Cites_doi | 10.1109/TSP.2020.3028701 10.1016/j.eng.2019.03.010 10.1109/TNNLS.2020.3028167 10.1038/ncomms15037 10.1016/j.inffus.2020.11.003 10.18653/v1/2021.naacl-main.250 10.18653/v1/D17-1113 10.1109/ICCV.2015.474 10.1038/s41467-022-30761-2 10.1109/ICCV.2017.386 10.18653/v1/D19-1050 10.4324/9781315798868 10.18653/v1/K16-1002 10.1016/j.tics.2021.07.006 10.1016/j.neuroimage.2017.08.017 10.18653/v1/S19-1013 10.3389/fninf.2015.00023 10.1109/CVPR46437.2021.00384 10.1002/hbm.22087 10.1167/19.10.209a 10.1109/TPAMI.2020.2995909 10.1162/tacl_a_00043 10.1109/TPAMI.2018.2857768 10.1111/j.1551-6709.2010.01106.x 10.1126/science.1063736 10.1371/journal.pone.0223792 10.1016/j.neuroimage.2022.119754 10.1016/j.tics.2020.08.005 10.1038/nn1444 10.1371/journal.pcbi.1006633 10.1109/CVPR42600.2020.00975 10.1109/CVPR.2017.479 10.1038/s41467-018-03068-4 10.1109/TNNLS.2018.2882456 10.1073/pnas.2105646118 10.1016/j.neuron.2012.01.010 10.1038/s41593-021-00921-6 10.1038/nature17637 10.18653/v1/2022.naacl-main.235 10.1109/CVPR.2018.00450 10.1016/j.neuroimage.2017.06.042 10.1109/TPAMI.2019.2909031 10.1038/s41467-019-08848-0 10.1109/CVPR.2019.01052 10.1109/CVPR.2019.00844 10.1016/j.neuroimage.2009.11.064 10.1038/nature06713 10.1109/CVPR46437.2021.01352 10.1109/CVPR.2016.15 10.1126/science.1152876 10.3115/v1/D14-1032  | 
    
| ContentType | Journal Article | 
    
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 | 
    
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 | 
    
| DBID | 97E RIA RIE AAYXX CITATION CGR CUY CVF ECM EIF NPM 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8  | 
    
| DOI | 10.1109/TPAMI.2023.3263181 | 
    
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts  Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic  | 
    
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional MEDLINE - Academic  | 
    
| DatabaseTitleList | MEDLINE - Academic Technology Research Database MEDLINE  | 
    
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 3 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Engineering Computer Science  | 
    
| EISSN | 2160-9292 1939-3539  | 
    
| EndPage | 10777 | 
    
| ExternalDocumentID | 37030711 10_1109_TPAMI_2023_3263181 10089190  | 
    
| Genre | orig-research Research Support, Non-U.S. Gov't Journal Article  | 
    
| GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 62206284; 62020106015; 61976209 funderid: 10.13039/501100001809 – fundername: CAAI-Huawei MindSpore Open Fund – fundername: National Key R&D Program of China grantid: 2022ZD0116500 – fundername: Strategic Priority Research Program of Chinese Academy of Sciences grantid: XDB32040000  | 
    
| GroupedDBID | --- -DZ -~X .DC 0R~ 29I 4.4 53G 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS RXW TAE TN5 UHB ~02 AAYXX CITATION 5VS 9M8 AAYOK ABFSI ADRHT AETEA AETIX AGSQL AI. AIBXA ALLEH CGR CUY CVF ECM EIF FA8 H~9 IBMZZ ICLAB IFJZH NPM RIG RNI RZB VH1 XJT 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8  | 
    
| ID | FETCH-LOGICAL-c418t-f52413b822843f68ac4fa8288fdf950c1af2fd1553f2187808cdcd3de2d91a863 | 
    
| IEDL.DBID | RIE | 
    
| ISSN | 0162-8828 1939-3539  | 
    
| IngestDate | Thu Oct 02 07:37:03 EDT 2025 Mon Jun 30 02:21:48 EDT 2025 Sun Apr 06 01:21:18 EDT 2025 Wed Oct 01 02:24:12 EDT 2025 Thu Apr 24 22:50:52 EDT 2025 Wed Aug 27 02:46:14 EDT 2025  | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Issue | 9 | 
    
| Language | English | 
    
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037  | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-c418t-f52413b822843f68ac4fa8288fdf950c1af2fd1553f2187808cdcd3de2d91a863 | 
    
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23  | 
    
| ORCID | 0000-0002-0084-433X 0000-0001-8701-2642 0000-0002-0684-1711  | 
    
| PMID | 37030711 | 
    
| PQID | 2847957620 | 
    
| PQPubID | 85458 | 
    
| PageCount | 18 | 
    
| ParticipantIDs | crossref_citationtrail_10_1109_TPAMI_2023_3263181 proquest_miscellaneous_2798714073 ieee_primary_10089190 pubmed_primary_37030711 crossref_primary_10_1109_TPAMI_2023_3263181 proquest_journals_2847957620  | 
    
| ProviderPackageCode | CITATION AAYXX  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2023-09-01 | 
    
| PublicationDateYYYYMMDD | 2023-09-01 | 
    
| PublicationDate_xml | – month: 09 year: 2023 text: 2023-09-01 day: 01  | 
    
| PublicationDecade | 2020 | 
    
| PublicationPlace | United States | 
    
| PublicationPlace_xml | – name: United States – name: New York  | 
    
| PublicationTitle | IEEE transactions on pattern analysis and machine intelligence | 
    
| PublicationTitleAbbrev | TPAMI | 
    
| PublicationTitleAlternate | IEEE Trans Pattern Anal Mach Intell | 
    
| PublicationYear | 2023 | 
    
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  | 
    
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  | 
    
| References | ref12 ref15 ref14 ref58 ref53 ref11 ref55 ref10 ref54 ref16 ref18 kubilius (ref74) 2019 radford (ref13) 2021 devlin (ref57) 2019 sutter (ref19) 2021 belghazi (ref50) 2018 ref45 ref48 ref42 ref41 alemi (ref46) 2016 ref44 ref43 nonaka (ref81) 2021; 24 romera-paredes (ref37) 2015 lan (ref59) 2020 bachman (ref47) 2019 ref49 shi (ref66) 2019 ref8 ref7 black (ref60) 2021 ref9 cheng (ref51) 2020 ref4 ref3 ref6 vaswani (ref56) 2017 palatucci (ref5) 0 ref40 wu (ref83) 2023 li (ref75) 2022 kingma (ref76) 2014 ref80 ref35 ref34 ref78 ref31 ref30 ref33 ref77 ref32 ref2 ref1 ref38 agakov (ref62) 2004; 16 van der maaten (ref79) 2008; 9 (ref84) 2023 rezende (ref64) 2014 poole (ref52) 2019 ref73 ref72 ref24 ref68 ref23 ref26 ref25 kingma (ref63) 2014 ref20 ref22 wu (ref65) 2018 chen (ref69) 2017 ref21 brown (ref82) 2020 frome (ref36) 2013 ref28 ref27 bujwid (ref17) 2021 ref29 alemi (ref67) 2018 dieng (ref71) 0 chen (ref70) 2016 zhang (ref39) 2019 ref61  | 
    
| References_xml | – start-page: 1 year: 2019 ident: ref47 article-title: Learning representations by maximizing mutual information across views publication-title: Proc Int Conf Neural Inf Process – ident: ref28 doi: 10.1109/TSP.2020.3028701 – start-page: 5171 year: 2019 ident: ref52 article-title: On variational bounds of mutual information publication-title: Proc Annu Int Conf Mach Learn – volume: 24 year: 2021 ident: ref81 article-title: Brain hierarchy score: Which deep neural networks are hierarchically brain-like? publication-title: Science – start-page: 5580 year: 2018 ident: ref65 article-title: Multimodal generative models for scalable weakly-supervised learning publication-title: Proc Int Conf Neural Inf Process – ident: ref80 doi: 10.1016/j.eng.2019.03.010 – start-page: 2732 year: 0 ident: ref71 article-title: Variational inference via ? upper bound minimization publication-title: Proc Int Conf Neural Inf Process – start-page: 1 year: 2016 ident: ref46 article-title: Deep variational information bottleneck publication-title: Proc Int Conf Learn Representations – ident: ref9 doi: 10.1109/TNNLS.2020.3028167 – ident: ref6 doi: 10.1038/ncomms15037 – ident: ref27 doi: 10.1016/j.inffus.2020.11.003 – ident: ref18 doi: 10.18653/v1/2021.naacl-main.250 – start-page: 1779 year: 2020 ident: ref51 article-title: CLUB: A contrastive log-ratio upper bound of mutual information publication-title: Proc Annu Int Conf Mach Learn – ident: ref44 doi: 10.18653/v1/D17-1113 – start-page: 1 year: 2019 ident: ref74 article-title: Brain-like object recognition with high-performing shallow recurrent ANNs publication-title: Proc Adv Neural Inf Process Syst – ident: ref40 doi: 10.1109/ICCV.2015.474 – year: 2014 ident: ref76 article-title: Adam: A method for stochastic optimization – start-page: 15692 year: 2019 ident: ref66 article-title: Variational mixture-of-experts autoencoders for multi-modal deep generative models publication-title: Proc Int Conf Neural Inf Process – ident: ref14 doi: 10.1038/s41467-022-30761-2 – ident: ref32 doi: 10.1109/ICCV.2017.386 – ident: ref58 doi: 10.18653/v1/D19-1050 – start-page: 1 year: 0 ident: ref5 article-title: Zero-shot learning with semantic output codes publication-title: Proc Int Conf Neural Inf Process – year: 2023 ident: ref83 article-title: Visual ChatGPT: Talking, drawing and editing with visual foundation models – ident: ref11 doi: 10.4324/9781315798868 – ident: ref68 doi: 10.18653/v1/K16-1002 – start-page: 8748 year: 2021 ident: ref13 article-title: Learning transferable visual models from natural language supervision publication-title: Proc Annu Int Conf Mach Learn – ident: ref12 doi: 10.1016/j.tics.2021.07.006 – ident: ref15 doi: 10.1016/j.neuroimage.2017.08.017 – ident: ref45 doi: 10.18653/v1/S19-1013 – ident: ref72 doi: 10.3389/fninf.2015.00023 – start-page: 2121 year: 2013 ident: ref36 article-title: Devise: A deep visual-semantic embedding model publication-title: Proc Adv Neural Inf Process Syst – volume: 9 start-page: 2579 year: 2008 ident: ref79 article-title: Visualizing data using t-SNE. publication-title: J Mach Learn Res – ident: ref25 doi: 10.1109/CVPR46437.2021.00384 – ident: ref24 doi: 10.1002/hbm.22087 – start-page: 1 year: 2021 ident: ref19 article-title: Generalized multimodal ELBO publication-title: Proc Int Conf Learn Representations – ident: ref8 doi: 10.1167/19.10.209a – start-page: 38 year: 2021 ident: ref17 article-title: Large-scale zero-shot image classification from rich and diverse textual descriptions publication-title: Proc Workshop Beyond Vis Lang Integrating Real-World Knowl – ident: ref3 doi: 10.1109/TPAMI.2020.2995909 – volume: 16 start-page: 201 year: 2004 ident: ref62 article-title: The IM algorithm: A variational approach to information maximization publication-title: Proc Adv Neural Inf Process Syst – start-page: 1 year: 2016 ident: ref70 article-title: InfoGan: Interpretable representation learning by information maximizing generative adversarial nets publication-title: Proc Adv Neural Inf Process Syst – start-page: 1 year: 2022 ident: ref75 article-title: BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation publication-title: Proc Annu Int Conf Mach Learn – start-page: 5998 year: 2017 ident: ref56 article-title: Attention is all you need publication-title: Proc Adv Neural Inf Process Syst – start-page: 159 year: 2018 ident: ref67 article-title: Fixing a broken ELBO publication-title: Proc Annu Int Conf Mach Learn – start-page: 4171 year: 2019 ident: ref57 article-title: BERT: Pre-training of deep bidirectional transformers for language understanding publication-title: Proc Conf North Amer Chapter Assoc Comput Linguistics - Hum Lang Technol – ident: ref43 doi: 10.1162/tacl_a_00043 – ident: ref33 doi: 10.1109/TPAMI.2018.2857768 – ident: ref54 doi: 10.1111/j.1551-6709.2010.01106.x – ident: ref22 doi: 10.1126/science.1063736 – ident: ref73 doi: 10.1371/journal.pone.0223792 – start-page: 1 year: 2017 ident: ref69 article-title: Variational lossy autoencoder publication-title: Proc Int Conf Learn Representations – start-page: 1278 year: 2014 ident: ref64 article-title: Stochastic backpropagation and approximate inference in deep generative models publication-title: Proc Annu Int Conf Mach Learn – ident: ref21 doi: 10.1016/j.neuroimage.2022.119754 – year: 2023 ident: ref84 article-title: GPT-4 technical report – start-page: 1 year: 2014 ident: ref63 article-title: Auto-encoding variational Bayes publication-title: Proc Int Conf Learn Representations – ident: ref10 doi: 10.1016/j.tics.2020.08.005 – ident: ref1 doi: 10.1038/nn1444 – ident: ref20 doi: 10.1371/journal.pcbi.1006633 – ident: ref48 doi: 10.1109/CVPR42600.2020.00975 – start-page: 2152 year: 2015 ident: ref37 article-title: An embarrassingly simple approach to zero-shot learning publication-title: Proc Annu Int Conf Mach Learn – ident: ref26 doi: 10.1109/CVPR.2017.479 – ident: ref30 doi: 10.1038/s41467-018-03068-4 – ident: ref2 doi: 10.1109/TNNLS.2018.2882456 – ident: ref55 doi: 10.1073/pnas.2105646118 – ident: ref77 doi: 10.1016/j.neuron.2012.01.010 – start-page: 531 year: 2018 ident: ref50 article-title: Mutual information neural estimation publication-title: Proc Annu Int Conf Mach Learn – ident: ref78 doi: 10.1038/s41593-021-00921-6 – ident: ref29 doi: 10.1038/nature17637 – year: 2021 ident: ref60 article-title: GPT-Neo: Large scale autoregressive language modeling with mesh-tensorflow – ident: ref61 doi: 10.18653/v1/2022.naacl-main.235 – ident: ref38 doi: 10.1109/CVPR.2018.00450 – ident: ref16 doi: 10.1016/j.neuroimage.2017.06.042 – ident: ref49 doi: 10.1109/TPAMI.2019.2909031 – ident: ref31 doi: 10.1038/s41467-019-08848-0 – start-page: 7434 year: 2019 ident: ref39 article-title: Co-representation network for generalized zero-shot learning publication-title: Proc Annu Int Conf Mach Learn – ident: ref41 doi: 10.1109/CVPR.2019.01052 – ident: ref35 doi: 10.1109/CVPR.2019.00844 – ident: ref23 doi: 10.1016/j.neuroimage.2009.11.064 – start-page: 1877 year: 2020 ident: ref82 article-title: Language models are few-shot learners publication-title: Proc Adv Neural Inf Process Syst – ident: ref7 doi: 10.1038/nature06713 – ident: ref53 doi: 10.1109/CVPR46437.2021.01352 – ident: ref34 doi: 10.1109/CVPR.2016.15 – start-page: 1 year: 2020 ident: ref59 article-title: ALBERT: A lite BERT for self-supervised learning of language representations publication-title: Proc Int Conf Learn Representations – ident: ref4 doi: 10.1126/science.1152876 – ident: ref42 doi: 10.3115/v1/D14-1032  | 
    
| SSID | ssj0014503 | 
    
| Score | 2.672787 | 
    
| Snippet | Decoding human visual neural representations is a challenging task with great scientific significance in revealing vision-processing mechanisms and developing... | 
    
| SourceID | proquest pubmed crossref ieee  | 
    
| SourceType | Aggregation Database Index Database Enrichment Source Publisher  | 
    
| StartPage | 10760 | 
    
| SubjectTerms | Algorithms Brain Brain - diagnostic imaging Brain modeling Brain-visual-linguistic embedding Categories Decoding generic neural decoding Humans Internet Learning Linguistics Model accuracy multimodal learning mutual information maximization Online services Regularization Representations Semantics Training Visual Perception Visual stimuli Visualization  | 
    
| Title | Decoding Visual Neural Representations by Multimodal Learning of Brain-Visual-Linguistic Features | 
    
| URI | https://ieeexplore.ieee.org/document/10089190 https://www.ncbi.nlm.nih.gov/pubmed/37030711 https://www.proquest.com/docview/2847957620 https://www.proquest.com/docview/2798714073  | 
    
| Volume | 45 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 2160-9292 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014503 issn: 0162-8828 databaseCode: RIE dateStart: 19790101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB6VPbWHLtBtWV5ypd6qhGTjJPaRp2glUFWxFbco8QMhIEFscoBfz4zjrGglKk6JZDuvmdHMZDzfB_DN6gq9eMSDCn1TgJYogkootCtuklJIbrnrSjs7z07n_Odleumb1V0vjDHGbT4zIZ26Wr5uVEe_yvYIiEaiB1uBlVxkfbPWsmTAU0eDjCEMmjjmEUOHTCT3Ln7tn_0IiSg8xGgFtZj4YRIHhRXHfzkkx7DyerDpnM7JGM6Hx-33mtyEXVuF6ukfJMc3v88qfPThJ9vv9WUN3pl6HcYDtQPzlr4OH17gFH6C8giTVHJy7M_1osP1BOmBh99uG63vXqoXrHpkrqH3rtE46qFbr1hj2QExUQT96gDz36vOAUQzikA7vMYE5ifHF4engedmCBSPRRvYlApyFYYXgic2E6XitsSvLqy2Mo1UXNqZ1URKZDGIyEUklFY60WamZVyKLPkMo7qpzQawVBuNWSFV6BRXiuizskxYyTGSKaMqnUI8CKhQHric-DNuC5fARLJw8i1IvoWX7xS-L9fc97Ad_509IeG8mNnLZQrbgyIU3rQXBflziVnaDIe_LofRKKnSUtam6XBOLgUhIebJFL70CrS8-KB3m6_cdAve07P1-9i2YdQ-dGYHA5-22nUK_wzp5fpU | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9tAEB4VeoAeoARo09KylXpDNna8dnaP9IFCSyJUJRU3y94Hqgp2ReID_PrOrNdRWilVT7a0u37NjGbGs_N9AO-tLtGLRzwo0TcFaIkiKIVCu-ImKYTklruutPEkG834l-v02jeru14YY4zbfGZCOnW1fF2rhn6VnRIQjUQPtgFPU8552rZrLYsGPHVEyBjEoJFjJtH1yETydHp1Nr4IiSo8xHgF9ZgYYhIHhhXHf7gkx7GyPtx0bud8FybdA7e7TX6GzaIM1eNfWI7__UbPYccHoOys1Zg9eGKqHux25A7M23oPnq0gFe5D8QnTVHJz7PuPeYPrCdQDD9_cRlrfv1TNWfnAXEvvXa1x1IO33rDasg_ERRG0qwPMgG8aBxHNKAZt8BoHMDv_PP04Cjw7Q6B4LBaBTakkV2KAIXhiM1Eobgv86sJqK9NIxYUdWE20RBbDiKGIhNJKJ9oMtIwLkSWHsFnVlXkJLNVGY15INTrFlSICrSwTVnKMZYqoTPsQdwLKlYcuJwaN29ylMJHMnXxzkm_u5duHk-WaXy1wxz9nH5BwVma2cunDUacIuTfueU4eXWKeNsDhd8thNEuqtRSVqRucM5SCsBCHSR9etAq0vHind6_W3PQYtkbT8WV-eTH5-hq26TnbXW1HsLm4b8wbDIMW5Vun_L8BKrv9oQ | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Decoding+Visual+Neural+Representations+by+Multimodal+Learning+of+Brain-Visual-Linguistic+Features&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Du%2C+Changde&rft.au=Fu%2C+Kaicheng&rft.au=Li%2C+Jinpeng&rft.au=He%2C+Huiguang&rft.date=2023-09-01&rft.issn=1939-3539&rft.eissn=1939-3539&rft.volume=45&rft.issue=9&rft.spage=10760&rft_id=info:doi/10.1109%2FTPAMI.2023.3263181&rft.externalDBID=NO_FULL_TEXT | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon |