Visual Translation Embedding Network for Visual Relation Detection

Visual relations, such as person ride bike and bike next to car, offer a comprehensive scene understanding of an image, and have already shown their great utility in connecting computer vision and natural language. However, due to the challenging combinatorial complexity of modeling subject-predicat...

Full description

Saved in:
Bibliographic Details
Published in2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 3107 - 3115
Main Authors Hanwang Zhang, Kyaw, Zawlin, Shih-Fu Chang, Tat-Seng Chua
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2017
Subjects
Online AccessGet full text
ISSN1063-6919
1063-6919
DOI10.1109/CVPR.2017.331

Cover

Abstract Visual relations, such as person ride bike and bike next to car, offer a comprehensive scene understanding of an image, and have already shown their great utility in connecting computer vision and natural language. However, due to the challenging combinatorial complexity of modeling subject-predicate-object relation triplets, very little work has been done to localize and predict visual relations. Inspired by the recent advances in relational representation learning of knowledge bases and convolutional object detection networks, we propose a Visual Translation Embedding network (VTransE) for visual relation detection. VTransE places objects in a low-dimensional relation space where a relation can be modeled as a simple vector translation, i.e., subject + predicate ≈ object. We propose a novel feature extraction layer that enables object-relation knowledge transfer in a fully-convolutional fashion that supports training and inference in a single forward/backward pass. To the best of our knowledge, VTransE is the first end-toend relation detection network. We demonstrate the effectiveness of VTransE over other state-of-the-art methods on two large-scale datasets: Visual Relationship and Visual Genome. Note that even though VTransE is a purely visual model, it is still competitive to the Lu's multi-modal model with language priors [27].
AbstractList Visual relations, such as person ride bike and bike next to car, offer a comprehensive scene understanding of an image, and have already shown their great utility in connecting computer vision and natural language. However, due to the challenging combinatorial complexity of modeling subject-predicate-object relation triplets, very little work has been done to localize and predict visual relations. Inspired by the recent advances in relational representation learning of knowledge bases and convolutional object detection networks, we propose a Visual Translation Embedding network (VTransE) for visual relation detection. VTransE places objects in a low-dimensional relation space where a relation can be modeled as a simple vector translation, i.e., subject + predicate ≈ object. We propose a novel feature extraction layer that enables object-relation knowledge transfer in a fully-convolutional fashion that supports training and inference in a single forward/backward pass. To the best of our knowledge, VTransE is the first end-toend relation detection network. We demonstrate the effectiveness of VTransE over other state-of-the-art methods on two large-scale datasets: Visual Relationship and Visual Genome. Note that even though VTransE is a purely visual model, it is still competitive to the Lu's multi-modal model with language priors [27].
Author Tat-Seng Chua
Kyaw, Zawlin
Shih-Fu Chang
Hanwang Zhang
Author_xml – sequence: 1
  surname: Hanwang Zhang
  fullname: Hanwang Zhang
  email: hanwangzhang@gmail.com
– sequence: 2
  givenname: Zawlin
  surname: Kyaw
  fullname: Kyaw, Zawlin
  email: kzl.zawlin@gmail.com
  organization: Nat. Univ. of Singapore, Singapore, Singapore
– sequence: 3
  surname: Shih-Fu Chang
  fullname: Shih-Fu Chang
  email: sfchang@ee.columbia.edu
– sequence: 4
  surname: Tat-Seng Chua
  fullname: Tat-Seng Chua
  email: dcscts@nus.edu.sg
  organization: Nat. Univ. of Singapore, Singapore, Singapore
BookMark eNpNjz1PwzAQhg0qEk1hZGLJH0i48zmOPUIoH1IFqCpdKzuxkSFNUBJU8e8JIgPL3TM87-neiM2atnGMXSCkiKCviu3LOuWAeUqERyzCjJQEkeXimM0RJCVSo57941MW9f07AKecw5zdbEP_Zep405mmr80Q2iZe7q2rqtC8xU9uOLTdR-zbLp7EtZusWze48pfO2Ik3de_Op71gr3fLTfGQrJ7vH4vrVRK4wCHRwlqSKDPJJXFemYxKS2Scc-Bl6ccJArzP1fgbKZNZ68XYyhgcHatowS7_7oYxsvvswt503zsFWisU9AMUhEz0
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR.2017.331
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore Digital Library
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Computer Science
EISBN 1538604574
9781538604571
EISSN 1063-6919
EndPage 3115
ExternalDocumentID 8099814
Genre orig-research
GroupedDBID 23M
29F
29O
6IE
6IH
6IK
ABDPE
ACGFS
ALMA_UNASSIGNED_HOLDINGS
CBEJK
IPLJI
M43
RIE
RIO
RNS
ID FETCH-LOGICAL-i241t-94bb36165626322da53cb33aeee0f6cfe0f040ff7802338a5bbf4331aa1aeeb83
IEDL.DBID RIE
ISSN 1063-6919
IngestDate Wed Aug 27 02:33:39 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i241t-94bb36165626322da53cb33aeee0f6cfe0f040ff7802338a5bbf4331aa1aeeb83
PageCount 9
ParticipantIDs ieee_primary_8099814
PublicationCentury 2000
PublicationDate 2017-07
PublicationDateYYYYMMDD 2017-07-01
PublicationDate_xml – month: 07
  year: 2017
  text: 2017-07
PublicationDecade 2010
PublicationTitle 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
PublicationTitleAbbrev CVPR
PublicationYear 2017
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0023720
ssj0003211698
Score 2.5582058
Snippet Visual relations, such as person ride bike and bike next to car, offer a comprehensive scene understanding of an image, and have already shown their great...
SourceID ieee
SourceType Publisher
StartPage 3107
SubjectTerms Computational modeling
Feature extraction
Knowledge based systems
Knowledge transfer
Object detection
Visualization
Title Visual Translation Embedding Network for Visual Relation Detection
URI https://ieeexplore.ieee.org/document/8099814
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5zJ09TN_E3OXi0nU3aNLk6N4awMcSN3UbSvMBQN3Htxb_eJP0xEQ9eSgmhpEke7yXve9-H0C2xO5dLHQUcrLnFTKaBMsQEBMB6A6HjBNxBcTJl43n8tEyWLXTX1MIAgAefQehefS5fb7PCXZX1uQ1nuFOtPkg5K2u1mvsUak8yTDQZBOLUV3ymk9GAiUjs-TX7g8Xs2YG60tBry_1QVfFOZdRBk3o4JZbkNSxyFWZfv5ga_zveI9Tbl-_hWeOYjlELNieoU8WbuLLmnW2qJR3qti56WKx3hXzD3oWVMDk8fFeg3ZfwtMSMYxvo4qpjDabDj5B7VNemh-aj4ctgHFQyC8Hauu88ELFSlDkWHsfdTrRMaKYolfZ37g3LjH1aSzcmdVxxlMtEKePqrKSMbB_F6Slqb7YbOEPYxlZRYhiTCSexASo0pVorlhIQGWHiHHXdLK0-SiaNVTVBF383X6JDt0olOPYKtfPPAq5tCJCrG7_23z_IrjA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8MgFCfLPOjJj834LQePtrNQaLk6t0zdlsVsy24LFEgWtTOuvfjXC_Rjxnjw0jSENBR4eQ_e7_1-ANwgs3NjLgMvVsbcQsojT2ikPaSU8QZMhkTZg-JoTAez8GlBFg1wW9fCKKUc-Ez59tXl8uU6ye1VWSc24UxsVat3SBiGpKjWqm9UsDnLUFbnEJDVX3G5Too9ygK2ZdjsdOeTFwvrinynLvdDV8W5lf4-GFUDKtAkr36eCT_5-sXV-N8RH4D2toAPTmrXdAgaKj0C-2XECUt73pimStShamuB-_lqk_M36JxYAZSDvXehpP0SHBeocWhCXVh2rOB08EFlDteVtsGs35t2B14ptOCtjAPPPBYKganl4bHs7UhyghOBMTe_c6dpos3T2LrWkWWLwzEnQmhbacV5YPqIGB-DZrpO1QmAJroKiKaUkxiFWmEmMZZS0AgpliDKTkHLztLyo-DSWJYTdPZ38zXYHUxHw-Xwcfx8DvbsihVQ2QvQzD5zdWkCgkxcuX3wDZAusX0
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2017+IEEE+Conference+on+Computer+Vision+and+Pattern+Recognition+%28CVPR%29&rft.atitle=Visual+Translation+Embedding+Network+for+Visual+Relation+Detection&rft.au=Hanwang+Zhang&rft.au=Kyaw%2C+Zawlin&rft.au=Shih-Fu+Chang&rft.au=Tat-Seng+Chua&rft.date=2017-07-01&rft.pub=IEEE&rft.issn=1063-6919&rft.eissn=1063-6919&rft.spage=3107&rft.epage=3115&rft_id=info:doi/10.1109%2FCVPR.2017.331&rft.externalDocID=8099814
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6919&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6919&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6919&client=summon