Joint Inference of Objects and Scenes With Efficient Learning of Text-Object-Scene Relations
The rapid growth of web images presents new challenges as well as opportunities to the task of image understanding. Conventional approaches rely heavily on fine-grained annotations, such as bounding boxes and semantic segmentations, which are not available for web-scale images. In general, images ov...
        Saved in:
      
    
          | Published in | IEEE transactions on multimedia Vol. 18; no. 3; pp. 507 - 520 | 
|---|---|
| Main Authors | , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        Piscataway
          IEEE
    
        01.03.2016
     The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1520-9210 1941-0077  | 
| DOI | 10.1109/TMM.2016.2520087 | 
Cover
| Abstract | The rapid growth of web images presents new challenges as well as opportunities to the task of image understanding. Conventional approaches rely heavily on fine-grained annotations, such as bounding boxes and semantic segmentations, which are not available for web-scale images. In general, images over the Internet are accompanied with descriptive texts, which are relevant to their contents. To bridge the gap between textual and visual analysis for image understanding, this paper presents an algorithm to learn the relations between scenes, objects, and texts with the help of image-level annotations. In particular, the relation between the texts and objects is modeled as the matching probability between the nouns and the object classes, which can be solved via a constrained bipartite matching problem. On the other hand, the relations between the scenes and objects/texts are modeled as the conditional distributions of their co-occurrence. Built upon the learned cross-domain relations, an integrated model brings together scenes, objects, and texts for joint image understanding, including scene classification, object classification and localization, and the prediction of object cardinalities. The proposed cross-domain learning algorithm and the integrated model elevate the performance of image understanding for web images in the context of textual descriptions. Experimental results show that the proposed algorithm significantly outperforms conventional methods in various computer vision tasks. | 
    
|---|---|
| AbstractList | The rapid growth of web images presents new challenges as well as opportunities to the task of image understanding. Conventional approaches rely heavily on fine-grained annotations, such as bounding boxes and semantic segmentations, which are not available for web-scale images. In general, images over the Internet are accompanied with descriptive texts, which are relevant to their contents. To bridge the gap between textual and visual analysis for image understanding, this paper presents an algorithm to learn the relations between scenes, objects, and texts with the help of image-level annotations. In particular, the relation between the texts and objects is modeled as the matching probability between the nouns and the object classes, which can be solved via a constrained bipartite matching problem. On the other hand, the relations between the scenes and objects/texts are modeled as the conditional distributions of their co-occurrence. Built upon the learned cross-domain relations, an integrated model brings together scenes, objects, and texts for joint image understanding, including scene classification, object classification and localization, and the prediction of object cardinalities. The proposed cross-domain learning algorithm and the integrated model elevate the performance of image understanding for web images in the context of textual descriptions. Experimental results show that the proposed algorithm significantly outperforms conventional methods in various computer vision tasks. | 
    
| Author | Zheng, Y. F. Botao Wang Hongkai Xiong Dahua Lin  | 
    
| Author_xml | – sequence: 1 surname: Botao Wang fullname: Botao Wang email: botaowang@sjtu.edu.cn organization: Dept. of Electron. Eng., Shanghai Jiao Tong Univ., Shanghai, China – sequence: 2 surname: Dahua Lin fullname: Dahua Lin email: dhlin@ie.cuhk.edu.hk organization: Dept. of Inf. Eng., Chinese Univ. of Hong Kong, Hong Kong, China – sequence: 3 surname: Hongkai Xiong fullname: Hongkai Xiong email: xionghongkai@sjtu.edu.cn organization: Dept. of Electron. Eng., Shanghai Jiao Tong Univ., Shanghai, China – sequence: 4 givenname: Y. F. surname: Zheng fullname: Zheng, Y. F. email: zheng@ece.osu.edu organization: Dept. of Electr. & Comput. Eng., Ohio State Univ., Columbus, OH, USA  | 
    
| BookMark | eNp9kE1LAzEQhoNUsK3eBS8Lnrcms-nO5iilaqVS0IoXYdmPiabUbE1S0H_vtls8ePA0mfA8eck7YD3bWGLsXPCREFxdLR8eRsBFOoIxcJ7hEesLJUXMOWKvPbe3sQLBT9jA-xXnQo459tnrfWNsiGZWkyNbUdToaFGuqAo-KmwdPVVkyUcvJrxHU61NZajF51Q4a-zbjl7SV4g7Jd7T0SOti2Aa60_ZsS7Wns4Oc8ieb6bLyV08X9zOJtfzuAIlQgw1qZJSACV5KhFQA6AYK4m1Qi1BUK3GAImSBWVSlIpTu5ZYYlqCriEZssvu3Y1rPrfkQ75qts62kbnADEFmwLOW4h1VucZ7RzrfOPNRuO9c8HzXYd52mO86zA8dtkr6R6lM2P8tuMKs_xMvOtEQ0W8OJhkiiuQHg29-bA | 
    
| CODEN | ITMUF8 | 
    
| CitedBy_id | crossref_primary_10_1049_iet_ipr_2018_5949 crossref_primary_10_1109_ACCESS_2018_2878899  | 
    
| Cites_doi | 10.1145/1101149.1101154 10.1109/CVPR.2014.309 10.1109/CVPR.2013.260 10.1109/CVPR.2010.5540000 10.1109/CVPR.2009.5206816 10.1109/TIP.2014.2310992 10.1109/TIP.2009.2017128 10.1109/ICCV.2013.344 10.1109/CVPR.2010.5540120 10.1109/ICCV.2013.371 10.1109/TMM.2013.2280895 10.1109/TPAMI.2012.79 10.1109/CVPR.2006.68 10.1109/TMM.2013.2267726 10.1109/CVPR.2006.95 10.1007/s11263-014-0733-5 10.1109/CVPR.2014.81 10.1109/CVPR.2010.5540112 10.1109/ICCV.2011.6126229 10.1109/CVPR.2014.539 10.1109/CVPR.2015.7298711 10.1109/TMM.2014.2306655 10.1109/TIP.2012.2202676 10.1109/TPAMI.2009.167 10.1109/CVPR.2010.5540018 10.1145/860458.860460  | 
    
| ContentType | Journal Article | 
    
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016 | 
    
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016 | 
    
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D  | 
    
| DOI | 10.1109/TMM.2016.2520087 | 
    
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts  Academic Computer and Information Systems Abstracts Professional  | 
    
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional  | 
    
| DatabaseTitleList | Technology Research Database  | 
    
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Engineering Computer Science  | 
    
| EISSN | 1941-0077 | 
    
| EndPage | 520 | 
    
| ExternalDocumentID | 4047732591 10_1109_TMM_2016_2520087 7387771  | 
    
| Genre | orig-research | 
    
| GrantInformation_xml | – fundername: NSFC grantid: 61425011; U1201255; 61271218; 61529101; 61472234; 61271211 funderid: 10.13039/100000001 – fundername: Shu Guanga  | 
    
| GroupedDBID | -~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P PQQKQ RIA RIE RNS TN5 VH1 ZY4 AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D  | 
    
| ID | FETCH-LOGICAL-c291t-2de9be62294064727f22715947d97f421ed9522394ae841b90e522b7b76b2fd23 | 
    
| IEDL.DBID | RIE | 
    
| ISSN | 1520-9210 | 
    
| IngestDate | Sun Jun 29 15:22:06 EDT 2025 Thu Apr 24 23:04:15 EDT 2025 Wed Oct 01 01:33:23 EDT 2025 Tue Aug 26 16:42:56 EDT 2025  | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Issue | 3 | 
    
| Keywords | Conditional random field scene classification object classification object localization  | 
    
| Language | English | 
    
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-c291t-2de9be62294064727f22715947d97f421ed9522394ae841b90e522b7b76b2fd23 | 
    
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14  | 
    
| PQID | 1787248208 | 
    
| PQPubID | 75737 | 
    
| PageCount | 14 | 
    
| ParticipantIDs | ieee_primary_7387771 crossref_citationtrail_10_1109_TMM_2016_2520087 crossref_primary_10_1109_TMM_2016_2520087 proquest_journals_1787248208  | 
    
| ProviderPackageCode | CITATION AAYXX  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2016-03-01 | 
    
| PublicationDateYYYYMMDD | 2016-03-01 | 
    
| PublicationDate_xml | – month: 03 year: 2016 text: 2016-03-01 day: 01  | 
    
| PublicationDecade | 2010 | 
    
| PublicationPlace | Piscataway | 
    
| PublicationPlace_xml | – name: Piscataway | 
    
| PublicationTitle | IEEE transactions on multimedia | 
    
| PublicationTitleAbbrev | TMM | 
    
| PublicationYear | 2016 | 
    
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  | 
    
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  | 
    
| References | ref35 ref12 ref36 ref14 ref31 ref30 ref32 ref10 barnard (ref21) 2003; 3 ref2 jia (ref25) 0 ref1 ref39 ref17 ref38 karpathy (ref11) 0 ref19 ref18 larochelle (ref26) 0 li (ref16) 0 torresani (ref40) 0 ref24 ref23 ref20 li (ref34) 0 gupta (ref15) 0 ref28 ref27 farhadi (ref5) 0 yang (ref37) 0 ref29 ref8 ref7 ref9 ref4 ref3 blei (ref22) 2003; 3 wang (ref6) 0 li (ref13) 0 klein (ref33) 0  | 
    
| References_xml | – start-page: 3 year: 0 ident: ref33 article-title: Fast exact inference with a factored model for natural language parsing publication-title: Proc Adv Neural Inform Process Syst – volume: 3 start-page: 993 year: 2003 ident: ref22 article-title: Latent dirichlet allocation publication-title: J Mach Learn Res – ident: ref19 doi: 10.1145/1101149.1101154 – ident: ref29 doi: 10.1109/CVPR.2014.309 – start-page: 16 year: 0 ident: ref15 article-title: Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers publication-title: Proc Eur Conf Comput Vis – start-page: 1378 year: 0 ident: ref34 article-title: Object bank: A high-level image representation for scene classification & semantic feature sparsification publication-title: Proc Adv Neural Inform Process Syst – ident: ref14 doi: 10.1109/CVPR.2013.260 – start-page: 15 year: 0 ident: ref5 article-title: Every picture tells a story: Generating sentences from images publication-title: Proc Eur Conf Comput Vis – start-page: 2036 year: 0 ident: ref13 article-title: Towards total scene understanding: Classification, annotation and segmentation in an automatic framework publication-title: Proc IEEE Conf Comput Vis Pattern Recog – ident: ref24 doi: 10.1109/CVPR.2010.5540000 – ident: ref17 doi: 10.1109/CVPR.2009.5206816 – ident: ref20 doi: 10.1109/TIP.2014.2310992 – start-page: 2407 year: 0 ident: ref25 article-title: Learning cross-modality similarity for multinomial data publication-title: Proc IEEE Int Conf Comput Vis – ident: ref3 doi: 10.1109/TIP.2009.2017128 – ident: ref10 doi: 10.1109/ICCV.2013.344 – ident: ref18 doi: 10.1109/CVPR.2010.5540120 – start-page: 1957 year: 0 ident: ref16 article-title: Landmark classification in large-scale image collections publication-title: Proc IEEE Int Conf Comput Vis – start-page: 2397 year: 0 ident: ref6 article-title: A discriminative latent model of image region and object tag correspondence publication-title: Proc Adv Neural Inform Process Syst – ident: ref31 doi: 10.1109/ICCV.2013.371 – ident: ref8 doi: 10.1109/TMM.2013.2280895 – ident: ref36 doi: 10.1109/TPAMI.2012.79 – ident: ref2 doi: 10.1109/CVPR.2006.68 – ident: ref9 doi: 10.1109/TMM.2013.2267726 – ident: ref39 doi: 10.1109/CVPR.2006.95 – ident: ref4 doi: 10.1007/s11263-014-0733-5 – ident: ref32 doi: 10.1109/CVPR.2014.81 – ident: ref7 doi: 10.1109/CVPR.2010.5540112 – ident: ref35 doi: 10.1109/ICCV.2011.6126229 – ident: ref27 doi: 10.1109/CVPR.2014.539 – ident: ref30 doi: 10.1109/CVPR.2015.7298711 – start-page: 1794 year: 0 ident: ref37 article-title: Linear spatial pyramid matching using sparse coding for image classification publication-title: Proc IEEE Conf Comput Vis Pattern Recog – ident: ref12 doi: 10.1109/TMM.2014.2306655 – ident: ref28 doi: 10.1109/TIP.2012.2202676 – start-page: 1889 year: 0 ident: ref11 article-title: Deep fragment embeddings for bidirectional image sentence mapping publication-title: Proc Adv Neural Inform Process Syst – ident: ref1 doi: 10.1109/TPAMI.2009.167 – ident: ref38 doi: 10.1109/CVPR.2010.5540018 – ident: ref23 doi: 10.1145/860458.860460 – start-page: 776 year: 0 ident: ref40 article-title: Efficient object category recognition using classemes publication-title: Proc Eur Conf Comput Vis – volume: 3 start-page: 1107 year: 2003 ident: ref21 article-title: Matching words and pictures publication-title: J Mach Learn Res – start-page: 2717 year: 0 ident: ref26 article-title: A neural autoregressive topic model publication-title: Proc Adv Neural Inform Process Syst  | 
    
| SSID | ssj0014507 | 
    
| Score | 2.1975234 | 
    
| Snippet | The rapid growth of web images presents new challenges as well as opportunities to the task of image understanding. Conventional approaches rely heavily on... | 
    
| SourceID | proquest crossref ieee  | 
    
| SourceType | Aggregation Database Enrichment Source Index Database Publisher  | 
    
| StartPage | 507 | 
    
| SubjectTerms | Algorithms Bicycles conditional random field Image segmentation Internet object classification Object detection object localization Prediction algorithms Scene classification Semantics Visualization  | 
    
| Title | Joint Inference of Objects and Scenes With Efficient Learning of Text-Object-Scene Relations | 
    
| URI | https://ieeexplore.ieee.org/document/7387771 https://www.proquest.com/docview/1787248208  | 
    
| Volume | 18 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1941-0077 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014507 issn: 1520-9210 databaseCode: RIE dateStart: 19990101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEB7Ukx5cn7i6Sg5eBLO7TdKkOYooKqxedtGDUJomUVG64nYv_nqT9MGiIt5amGkDXx4zmZlvAI6p1plkwmLKeYyZ4Bw7F8zinJOMZYkQUe4v9Ee3_GrCbh7ihyU4bWthjDEh-cz0_WOI5etpPvdXZQNBPXud83WWRcKrWq02YsDiUBrtjqMhls6PaUKSQzkYj0Y-h4v3iecY8slzC0dQ6KnyYyMOp8tlB0bNuKqkktf-vFT9_PMbZeN_B74B67WZic6qebEJS6bYgk7TwgHVK3oL1hb4CLfh8Wb6UpTouikCRFOL7pS_qZmhrNBOze-M6P6lfEYXgXvC_RbVFK1PXnrsHelKBQdp1Gbb7cDk8mJ8foXr9gs4JzIqMdFGKsMJkcxXpBJhCRHO-mFCS2EZiYyWznqjkmUmYZGSQ-NelVCCK2I1obuwUkwLswdIxW7hK5o4Y4cwSvPExtrY3CoRM81p1IVBg0ia19zkvkXGWxp8lKFMHYapxzCtMezCSavxXvFy_CG77SFp5Wo0utBrQE_rhTtLI7eBEebMomT_d60DWPXfrtLQerBSfszNobNLSnUUJuQXP0rb-Q | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NTxsxEB0hOLQcSJuAGkhbH7hUwknW9trrI0JEgbJwSVQOlVbrtU0RaINgc-HXY3s_hEqFetuVZmRLz2PP2DNvAA6p1rlkwmLKeYyZ4By7EMzigpOc5YkQUeEv9NNLPl-y8-v4egOOuloYY0xIPjNj_xne8vWqWPursomgnr3OxTpbMWMsrqu1ujcDFofiaHcgTbF0kUz7KDmVk0Wa-iwuPiaeZcinz706hEJXlTdbcThfZj1I25nVaSV343WlxsXzX6SN_zv1T7DTOJrouF4Zn2HDlH3otU0cUGPTfdh-xUg4gN_nq9uyQmdtGSBaWXSl_F3NE8pL7dT83oh-3VZ_0Glgn3DDooak9cZLL3woXavgII26fLtdWM5OFydz3DRgwAWRUYWJNlIZTohkviaVCEuIcP4PE1oKy0hktHT-G5UsNwmLlJwa96uEElwRqwndg81yVZovgFTsTF_RxLk7hFFaJDbWxhZWiZhpTqMhTFpEsqJhJ_dNMu6zEKVMZeYwzDyGWYPhEH50Gg81M8c7sgMPSSfXoDGEUQt61pjuUxa5LYww5xgl-__W-g4f5ov0Irs4u_x5AB_9OHVS2gg2q8e1-eq8lEp9C4vzBe5z30Y | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Joint+Inference+of+Objects+and+Scenes+With+Efficient+Learning+of+Text-Object-Scene+Relations&rft.jtitle=IEEE+transactions+on+multimedia&rft.au=Botao+Wang&rft.au=Dahua+Lin&rft.au=Hongkai+Xiong&rft.au=Zheng%2C+Y.+F.&rft.date=2016-03-01&rft.pub=IEEE&rft.issn=1520-9210&rft.volume=18&rft.issue=3&rft.spage=507&rft.epage=520&rft_id=info:doi/10.1109%2FTMM.2016.2520087&rft.externalDocID=7387771 | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-9210&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-9210&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-9210&client=summon |