Joint Inference of Objects and Scenes With Efficient Learning of Text-Object-Scene Relations

The rapid growth of web images presents new challenges as well as opportunities to the task of image understanding. Conventional approaches rely heavily on fine-grained annotations, such as bounding boxes and semantic segmentations, which are not available for web-scale images. In general, images ov...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on multimedia Vol. 18; no. 3; pp. 507 - 520
Main Authors Botao Wang, Dahua Lin, Hongkai Xiong, Zheng, Y. F.
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.03.2016
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1520-9210
1941-0077
DOI10.1109/TMM.2016.2520087

Cover

Abstract The rapid growth of web images presents new challenges as well as opportunities to the task of image understanding. Conventional approaches rely heavily on fine-grained annotations, such as bounding boxes and semantic segmentations, which are not available for web-scale images. In general, images over the Internet are accompanied with descriptive texts, which are relevant to their contents. To bridge the gap between textual and visual analysis for image understanding, this paper presents an algorithm to learn the relations between scenes, objects, and texts with the help of image-level annotations. In particular, the relation between the texts and objects is modeled as the matching probability between the nouns and the object classes, which can be solved via a constrained bipartite matching problem. On the other hand, the relations between the scenes and objects/texts are modeled as the conditional distributions of their co-occurrence. Built upon the learned cross-domain relations, an integrated model brings together scenes, objects, and texts for joint image understanding, including scene classification, object classification and localization, and the prediction of object cardinalities. The proposed cross-domain learning algorithm and the integrated model elevate the performance of image understanding for web images in the context of textual descriptions. Experimental results show that the proposed algorithm significantly outperforms conventional methods in various computer vision tasks.
AbstractList The rapid growth of web images presents new challenges as well as opportunities to the task of image understanding. Conventional approaches rely heavily on fine-grained annotations, such as bounding boxes and semantic segmentations, which are not available for web-scale images. In general, images over the Internet are accompanied with descriptive texts, which are relevant to their contents. To bridge the gap between textual and visual analysis for image understanding, this paper presents an algorithm to learn the relations between scenes, objects, and texts with the help of image-level annotations. In particular, the relation between the texts and objects is modeled as the matching probability between the nouns and the object classes, which can be solved via a constrained bipartite matching problem. On the other hand, the relations between the scenes and objects/texts are modeled as the conditional distributions of their co-occurrence. Built upon the learned cross-domain relations, an integrated model brings together scenes, objects, and texts for joint image understanding, including scene classification, object classification and localization, and the prediction of object cardinalities. The proposed cross-domain learning algorithm and the integrated model elevate the performance of image understanding for web images in the context of textual descriptions. Experimental results show that the proposed algorithm significantly outperforms conventional methods in various computer vision tasks.
Author Zheng, Y. F.
Botao Wang
Hongkai Xiong
Dahua Lin
Author_xml – sequence: 1
  surname: Botao Wang
  fullname: Botao Wang
  email: botaowang@sjtu.edu.cn
  organization: Dept. of Electron. Eng., Shanghai Jiao Tong Univ., Shanghai, China
– sequence: 2
  surname: Dahua Lin
  fullname: Dahua Lin
  email: dhlin@ie.cuhk.edu.hk
  organization: Dept. of Inf. Eng., Chinese Univ. of Hong Kong, Hong Kong, China
– sequence: 3
  surname: Hongkai Xiong
  fullname: Hongkai Xiong
  email: xionghongkai@sjtu.edu.cn
  organization: Dept. of Electron. Eng., Shanghai Jiao Tong Univ., Shanghai, China
– sequence: 4
  givenname: Y. F.
  surname: Zheng
  fullname: Zheng, Y. F.
  email: zheng@ece.osu.edu
  organization: Dept. of Electr. & Comput. Eng., Ohio State Univ., Columbus, OH, USA
BookMark eNp9kE1LAzEQhoNUsK3eBS8Lnrcms-nO5iilaqVS0IoXYdmPiabUbE1S0H_vtls8ePA0mfA8eck7YD3bWGLsXPCREFxdLR8eRsBFOoIxcJ7hEesLJUXMOWKvPbe3sQLBT9jA-xXnQo459tnrfWNsiGZWkyNbUdToaFGuqAo-KmwdPVVkyUcvJrxHU61NZajF51Q4a-zbjl7SV4g7Jd7T0SOti2Aa60_ZsS7Wns4Oc8ieb6bLyV08X9zOJtfzuAIlQgw1qZJSACV5KhFQA6AYK4m1Qi1BUK3GAImSBWVSlIpTu5ZYYlqCriEZssvu3Y1rPrfkQ75qts62kbnADEFmwLOW4h1VucZ7RzrfOPNRuO9c8HzXYd52mO86zA8dtkr6R6lM2P8tuMKs_xMvOtEQ0W8OJhkiiuQHg29-bA
CODEN ITMUF8
CitedBy_id crossref_primary_10_1049_iet_ipr_2018_5949
crossref_primary_10_1109_ACCESS_2018_2878899
Cites_doi 10.1145/1101149.1101154
10.1109/CVPR.2014.309
10.1109/CVPR.2013.260
10.1109/CVPR.2010.5540000
10.1109/CVPR.2009.5206816
10.1109/TIP.2014.2310992
10.1109/TIP.2009.2017128
10.1109/ICCV.2013.344
10.1109/CVPR.2010.5540120
10.1109/ICCV.2013.371
10.1109/TMM.2013.2280895
10.1109/TPAMI.2012.79
10.1109/CVPR.2006.68
10.1109/TMM.2013.2267726
10.1109/CVPR.2006.95
10.1007/s11263-014-0733-5
10.1109/CVPR.2014.81
10.1109/CVPR.2010.5540112
10.1109/ICCV.2011.6126229
10.1109/CVPR.2014.539
10.1109/CVPR.2015.7298711
10.1109/TMM.2014.2306655
10.1109/TIP.2012.2202676
10.1109/TPAMI.2009.167
10.1109/CVPR.2010.5540018
10.1145/860458.860460
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TMM.2016.2520087
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1941-0077
EndPage 520
ExternalDocumentID 4047732591
10_1109_TMM_2016_2520087
7387771
Genre orig-research
GrantInformation_xml – fundername: NSFC
  grantid: 61425011; U1201255; 61271218; 61529101; 61472234; 61271211
  funderid: 10.13039/100000001
– fundername: Shu Guanga
GroupedDBID -~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
TN5
VH1
ZY4
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c291t-2de9be62294064727f22715947d97f421ed9522394ae841b90e522b7b76b2fd23
IEDL.DBID RIE
ISSN 1520-9210
IngestDate Sun Jun 29 15:22:06 EDT 2025
Thu Apr 24 23:04:15 EDT 2025
Wed Oct 01 01:33:23 EDT 2025
Tue Aug 26 16:42:56 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Keywords Conditional random field
scene classification
object classification
object localization
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c291t-2de9be62294064727f22715947d97f421ed9522394ae841b90e522b7b76b2fd23
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
PQID 1787248208
PQPubID 75737
PageCount 14
ParticipantIDs ieee_primary_7387771
crossref_citationtrail_10_1109_TMM_2016_2520087
crossref_primary_10_1109_TMM_2016_2520087
proquest_journals_1787248208
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2016-03-01
PublicationDateYYYYMMDD 2016-03-01
PublicationDate_xml – month: 03
  year: 2016
  text: 2016-03-01
  day: 01
PublicationDecade 2010
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE transactions on multimedia
PublicationTitleAbbrev TMM
PublicationYear 2016
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref35
ref12
ref36
ref14
ref31
ref30
ref32
ref10
barnard (ref21) 2003; 3
ref2
jia (ref25) 0
ref1
ref39
ref17
ref38
karpathy (ref11) 0
ref19
ref18
larochelle (ref26) 0
li (ref16) 0
torresani (ref40) 0
ref24
ref23
ref20
li (ref34) 0
gupta (ref15) 0
ref28
ref27
farhadi (ref5) 0
yang (ref37) 0
ref29
ref8
ref7
ref9
ref4
ref3
blei (ref22) 2003; 3
wang (ref6) 0
li (ref13) 0
klein (ref33) 0
References_xml – start-page: 3
  year: 0
  ident: ref33
  article-title: Fast exact inference with a factored model for natural language parsing
  publication-title: Proc Adv Neural Inform Process Syst
– volume: 3
  start-page: 993
  year: 2003
  ident: ref22
  article-title: Latent dirichlet allocation
  publication-title: J Mach Learn Res
– ident: ref19
  doi: 10.1145/1101149.1101154
– ident: ref29
  doi: 10.1109/CVPR.2014.309
– start-page: 16
  year: 0
  ident: ref15
  article-title: Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers
  publication-title: Proc Eur Conf Comput Vis
– start-page: 1378
  year: 0
  ident: ref34
  article-title: Object bank: A high-level image representation for scene classification & semantic feature sparsification
  publication-title: Proc Adv Neural Inform Process Syst
– ident: ref14
  doi: 10.1109/CVPR.2013.260
– start-page: 15
  year: 0
  ident: ref5
  article-title: Every picture tells a story: Generating sentences from images
  publication-title: Proc Eur Conf Comput Vis
– start-page: 2036
  year: 0
  ident: ref13
  article-title: Towards total scene understanding: Classification, annotation and segmentation in an automatic framework
  publication-title: Proc IEEE Conf Comput Vis Pattern Recog
– ident: ref24
  doi: 10.1109/CVPR.2010.5540000
– ident: ref17
  doi: 10.1109/CVPR.2009.5206816
– ident: ref20
  doi: 10.1109/TIP.2014.2310992
– start-page: 2407
  year: 0
  ident: ref25
  article-title: Learning cross-modality similarity for multinomial data
  publication-title: Proc IEEE Int Conf Comput Vis
– ident: ref3
  doi: 10.1109/TIP.2009.2017128
– ident: ref10
  doi: 10.1109/ICCV.2013.344
– ident: ref18
  doi: 10.1109/CVPR.2010.5540120
– start-page: 1957
  year: 0
  ident: ref16
  article-title: Landmark classification in large-scale image collections
  publication-title: Proc IEEE Int Conf Comput Vis
– start-page: 2397
  year: 0
  ident: ref6
  article-title: A discriminative latent model of image region and object tag correspondence
  publication-title: Proc Adv Neural Inform Process Syst
– ident: ref31
  doi: 10.1109/ICCV.2013.371
– ident: ref8
  doi: 10.1109/TMM.2013.2280895
– ident: ref36
  doi: 10.1109/TPAMI.2012.79
– ident: ref2
  doi: 10.1109/CVPR.2006.68
– ident: ref9
  doi: 10.1109/TMM.2013.2267726
– ident: ref39
  doi: 10.1109/CVPR.2006.95
– ident: ref4
  doi: 10.1007/s11263-014-0733-5
– ident: ref32
  doi: 10.1109/CVPR.2014.81
– ident: ref7
  doi: 10.1109/CVPR.2010.5540112
– ident: ref35
  doi: 10.1109/ICCV.2011.6126229
– ident: ref27
  doi: 10.1109/CVPR.2014.539
– ident: ref30
  doi: 10.1109/CVPR.2015.7298711
– start-page: 1794
  year: 0
  ident: ref37
  article-title: Linear spatial pyramid matching using sparse coding for image classification
  publication-title: Proc IEEE Conf Comput Vis Pattern Recog
– ident: ref12
  doi: 10.1109/TMM.2014.2306655
– ident: ref28
  doi: 10.1109/TIP.2012.2202676
– start-page: 1889
  year: 0
  ident: ref11
  article-title: Deep fragment embeddings for bidirectional image sentence mapping
  publication-title: Proc Adv Neural Inform Process Syst
– ident: ref1
  doi: 10.1109/TPAMI.2009.167
– ident: ref38
  doi: 10.1109/CVPR.2010.5540018
– ident: ref23
  doi: 10.1145/860458.860460
– start-page: 776
  year: 0
  ident: ref40
  article-title: Efficient object category recognition using classemes
  publication-title: Proc Eur Conf Comput Vis
– volume: 3
  start-page: 1107
  year: 2003
  ident: ref21
  article-title: Matching words and pictures
  publication-title: J Mach Learn Res
– start-page: 2717
  year: 0
  ident: ref26
  article-title: A neural autoregressive topic model
  publication-title: Proc Adv Neural Inform Process Syst
SSID ssj0014507
Score 2.1975234
Snippet The rapid growth of web images presents new challenges as well as opportunities to the task of image understanding. Conventional approaches rely heavily on...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 507
SubjectTerms Algorithms
Bicycles
conditional random field
Image segmentation
Internet
object classification
Object detection
object localization
Prediction algorithms
Scene classification
Semantics
Visualization
Title Joint Inference of Objects and Scenes With Efficient Learning of Text-Object-Scene Relations
URI https://ieeexplore.ieee.org/document/7387771
https://www.proquest.com/docview/1787248208
Volume 18
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1941-0077
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014507
  issn: 1520-9210
  databaseCode: RIE
  dateStart: 19990101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEB7Ukx5cn7i6Sg5eBLO7TdKkOYooKqxedtGDUJomUVG64nYv_nqT9MGiIt5amGkDXx4zmZlvAI6p1plkwmLKeYyZ4Bw7F8zinJOMZYkQUe4v9Ee3_GrCbh7ihyU4bWthjDEh-cz0_WOI5etpPvdXZQNBPXud83WWRcKrWq02YsDiUBrtjqMhls6PaUKSQzkYj0Y-h4v3iecY8slzC0dQ6KnyYyMOp8tlB0bNuKqkktf-vFT9_PMbZeN_B74B67WZic6qebEJS6bYgk7TwgHVK3oL1hb4CLfh8Wb6UpTouikCRFOL7pS_qZmhrNBOze-M6P6lfEYXgXvC_RbVFK1PXnrsHelKBQdp1Gbb7cDk8mJ8foXr9gs4JzIqMdFGKsMJkcxXpBJhCRHO-mFCS2EZiYyWznqjkmUmYZGSQ-NelVCCK2I1obuwUkwLswdIxW7hK5o4Y4cwSvPExtrY3CoRM81p1IVBg0ia19zkvkXGWxp8lKFMHYapxzCtMezCSavxXvFy_CG77SFp5Wo0utBrQE_rhTtLI7eBEebMomT_d60DWPXfrtLQerBSfszNobNLSnUUJuQXP0rb-Q
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NTxsxEB0hOLQcSJuAGkhbH7hUwknW9trrI0JEgbJwSVQOlVbrtU0RaINgc-HXY3s_hEqFetuVZmRLz2PP2DNvAA6p1rlkwmLKeYyZ4By7EMzigpOc5YkQUeEv9NNLPl-y8-v4egOOuloYY0xIPjNj_xne8vWqWPursomgnr3OxTpbMWMsrqu1ujcDFofiaHcgTbF0kUz7KDmVk0Wa-iwuPiaeZcinz706hEJXlTdbcThfZj1I25nVaSV343WlxsXzX6SN_zv1T7DTOJrouF4Zn2HDlH3otU0cUGPTfdh-xUg4gN_nq9uyQmdtGSBaWXSl_F3NE8pL7dT83oh-3VZ_0Glgn3DDooak9cZLL3woXavgII26fLtdWM5OFydz3DRgwAWRUYWJNlIZTohkviaVCEuIcP4PE1oKy0hktHT-G5UsNwmLlJwa96uEElwRqwndg81yVZovgFTsTF_RxLk7hFFaJDbWxhZWiZhpTqMhTFpEsqJhJ_dNMu6zEKVMZeYwzDyGWYPhEH50Gg81M8c7sgMPSSfXoDGEUQt61pjuUxa5LYww5xgl-__W-g4f5ov0Irs4u_x5AB_9OHVS2gg2q8e1-eq8lEp9C4vzBe5z30Y
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Joint+Inference+of+Objects+and+Scenes+With+Efficient+Learning+of+Text-Object-Scene+Relations&rft.jtitle=IEEE+transactions+on+multimedia&rft.au=Botao+Wang&rft.au=Dahua+Lin&rft.au=Hongkai+Xiong&rft.au=Zheng%2C+Y.+F.&rft.date=2016-03-01&rft.pub=IEEE&rft.issn=1520-9210&rft.volume=18&rft.issue=3&rft.spage=507&rft.epage=520&rft_id=info:doi/10.1109%2FTMM.2016.2520087&rft.externalDocID=7387771
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-9210&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-9210&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-9210&client=summon