Visual Translation Embedding Network for Visual Relation Detection

Visual relations, such as person ride bike and bike next to car, offer a comprehensive scene understanding of an image, and have already shown their great utility in connecting computer vision and natural language. However, due to the challenging combinatorial complexity of modeling subject-predicat...

Full description

Saved in:

Bibliographic Details
Published in	2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 3107 - 3115
Main Authors	Hanwang Zhang, Kyaw, Zawlin, Shih-Fu Chang, Tat-Seng Chua
Format	Conference Proceeding
Language	English
Published	IEEE 01.07.2017
Subjects	Computational modeling Feature extraction Knowledge based systems Knowledge transfer Object detection Visualization
Online Access	Get full text
ISSN	1063-6919 1063-6919
DOI	10.1109/CVPR.2017.331

Cover

Abstract	Visual relations, such as person ride bike and bike next to car, offer a comprehensive scene understanding of an image, and have already shown their great utility in connecting computer vision and natural language. However, due to the challenging combinatorial complexity of modeling subject-predicate-object relation triplets, very little work has been done to localize and predict visual relations. Inspired by the recent advances in relational representation learning of knowledge bases and convolutional object detection networks, we propose a Visual Translation Embedding network (VTransE) for visual relation detection. VTransE places objects in a low-dimensional relation space where a relation can be modeled as a simple vector translation, i.e., subject + predicate ≈ object. We propose a novel feature extraction layer that enables object-relation knowledge transfer in a fully-convolutional fashion that supports training and inference in a single forward/backward pass. To the best of our knowledge, VTransE is the first end-toend relation detection network. We demonstrate the effectiveness of VTransE over other state-of-the-art methods on two large-scale datasets: Visual Relationship and Visual Genome. Note that even though VTransE is a purely visual model, it is still competitive to the Lu's multi-modal model with language priors [27].
AbstractList	Visual relations, such as person ride bike and bike next to car, offer a comprehensive scene understanding of an image, and have already shown their great utility in connecting computer vision and natural language. However, due to the challenging combinatorial complexity of modeling subject-predicate-object relation triplets, very little work has been done to localize and predict visual relations. Inspired by the recent advances in relational representation learning of knowledge bases and convolutional object detection networks, we propose a Visual Translation Embedding network (VTransE) for visual relation detection. VTransE places objects in a low-dimensional relation space where a relation can be modeled as a simple vector translation, i.e., subject + predicate ≈ object. We propose a novel feature extraction layer that enables object-relation knowledge transfer in a fully-convolutional fashion that supports training and inference in a single forward/backward pass. To the best of our knowledge, VTransE is the first end-toend relation detection network. We demonstrate the effectiveness of VTransE over other state-of-the-art methods on two large-scale datasets: Visual Relationship and Visual Genome. Note that even though VTransE is a purely visual model, it is still competitive to the Lu's multi-modal model with language priors [27].
Author	Tat-Seng Chua Kyaw, Zawlin Shih-Fu Chang Hanwang Zhang
Author_xml	– sequence: 1 surname: Hanwang Zhang fullname: Hanwang Zhang email: hanwangzhang@gmail.com – sequence: 2 givenname: Zawlin surname: Kyaw fullname: Kyaw, Zawlin email: kzl.zawlin@gmail.com organization: Nat. Univ. of Singapore, Singapore, Singapore – sequence: 3 surname: Shih-Fu Chang fullname: Shih-Fu Chang email: sfchang@ee.columbia.edu – sequence: 4 surname: Tat-Seng Chua fullname: Tat-Seng Chua email: dcscts@nus.edu.sg organization: Nat. Univ. of Singapore, Singapore, Singapore
BookMark	eNpNjz1PwzAQhg0qEk1hZGLJH0i48zmOPUIoH1IFqCpdKzuxkSFNUBJU8e8JIgPL3TM87-neiM2atnGMXSCkiKCviu3LOuWAeUqERyzCjJQEkeXimM0RJCVSo57941MW9f07AKecw5zdbEP_Zep405mmr80Q2iZe7q2rqtC8xU9uOLTdR-zbLp7EtZusWze48pfO2Ik3de_Op71gr3fLTfGQrJ7vH4vrVRK4wCHRwlqSKDPJJXFemYxKS2Scc-Bl6ccJArzP1fgbKZNZ68XYyhgcHatowS7_7oYxsvvswt503zsFWisU9AMUhEz0
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/CVPR.2017.331
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore Digital Library IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences Computer Science
EISBN	1538604574 9781538604571
EISSN	1063-6919
EndPage	3115
ExternalDocumentID	8099814
Genre	orig-research
GroupedDBID	23M 29F 29O 6IE 6IH 6IK ABDPE ACGFS ALMA_UNASSIGNED_HOLDINGS CBEJK IPLJI M43 RIE RIO RNS
ID	FETCH-LOGICAL-i241t-94bb36165626322da53cb33aeee0f6cfe0f040ff7802338a5bbf4331aa1aeeb83
IEDL.DBID	RIE
ISSN	1063-6919
IngestDate	Wed Aug 27 02:33:39 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i241t-94bb36165626322da53cb33aeee0f6cfe0f040ff7802338a5bbf4331aa1aeeb83
PageCount	9
ParticipantIDs	ieee_primary_8099814
PublicationCentury	2000
PublicationDate	2017-07
PublicationDateYYYYMMDD	2017-07-01
PublicationDate_xml	– month: 07 year: 2017 text: 2017-07
PublicationDecade	2010
PublicationTitle	2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
PublicationTitleAbbrev	CVPR
PublicationYear	2017
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0023720 ssj0003211698
Score	2.5582058
Snippet	Visual relations, such as person ride bike and bike next to car, offer a comprehensive scene understanding of an image, and have already shown their great...
SourceID	ieee
SourceType	Publisher
StartPage	3107
SubjectTerms	Computational modeling Feature extraction Knowledge based systems Knowledge transfer Object detection Visualization
Title	Visual Translation Embedding Network for Visual Relation Detection
URI	https://ieeexplore.ieee.org/document/8099814
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5zJ09TN_E3OXi0nU3aNLk6N4awMcSN3UbSvMBQN3Htxb_eJP0xEQ9eSgmhpEke7yXve9-H0C2xO5dLHQUcrLnFTKaBMsQEBMB6A6HjBNxBcTJl43n8tEyWLXTX1MIAgAefQehefS5fb7PCXZX1uQ1nuFOtPkg5K2u1mvsUak8yTDQZBOLUV3ymk9GAiUjs-TX7g8Xs2YG60tBry_1QVfFOZdRBk3o4JZbkNSxyFWZfv5ga_zveI9Tbl-_hWeOYjlELNieoU8WbuLLmnW2qJR3qti56WKx3hXzD3oWVMDk8fFeg3ZfwtMSMYxvo4qpjDabDj5B7VNemh-aj4ctgHFQyC8Hauu88ELFSlDkWHsfdTrRMaKYolfZ37g3LjH1aSzcmdVxxlMtEKePqrKSMbB_F6Slqb7YbOEPYxlZRYhiTCSexASo0pVorlhIQGWHiHHXdLK0-SiaNVTVBF383X6JDt0olOPYKtfPPAq5tCJCrG7_23z_IrjA
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8MgFCfLPOjJj834LQePtrNQaLk6t0zdlsVsy24LFEgWtTOuvfjXC_Rjxnjw0jSENBR4eQ_e7_1-ANwgs3NjLgMvVsbcQsojT2ikPaSU8QZMhkTZg-JoTAez8GlBFg1wW9fCKKUc-Ez59tXl8uU6ye1VWSc24UxsVat3SBiGpKjWqm9UsDnLUFbnEJDVX3G5Too9ygK2ZdjsdOeTFwvrinynLvdDV8W5lf4-GFUDKtAkr36eCT_5-sXV-N8RH4D2toAPTmrXdAgaKj0C-2XECUt73pimStShamuB-_lqk_M36JxYAZSDvXehpP0SHBeocWhCXVh2rOB08EFlDteVtsGs35t2B14ptOCtjAPPPBYKganl4bHs7UhyghOBMTe_c6dpos3T2LrWkWWLwzEnQmhbacV5YPqIGB-DZrpO1QmAJroKiKaUkxiFWmEmMZZS0AgpliDKTkHLztLyo-DSWJYTdPZ38zXYHUxHw-Xwcfx8DvbsihVQ2QvQzD5zdWkCgkxcuX3wDZAusX0
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2017+IEEE+Conference+on+Computer+Vision+and+Pattern+Recognition+%28CVPR%29&rft.atitle=Visual+Translation+Embedding+Network+for+Visual+Relation+Detection&rft.au=Hanwang+Zhang&rft.au=Kyaw%2C+Zawlin&rft.au=Shih-Fu+Chang&rft.au=Tat-Seng+Chua&rft.date=2017-07-01&rft.pub=IEEE&rft.issn=1063-6919&rft.eissn=1063-6919&rft.spage=3107&rft.epage=3115&rft_id=info:doi/10.1109%2FCVPR.2017.331&rft.externalDocID=8099814
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6919&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6919&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6919&client=summon