Visual Translation Embedding Network for Visual Relation Detection
Visual relations, such as person ride bike and bike next to car, offer a comprehensive scene understanding of an image, and have already shown their great utility in connecting computer vision and natural language. However, due to the challenging combinatorial complexity of modeling subject-predicat...
        Saved in:
      
    
          | Published in | 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 3107 - 3115 | 
|---|---|
| Main Authors | , , , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            IEEE
    
        01.07.2017
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1063-6919 1063-6919  | 
| DOI | 10.1109/CVPR.2017.331 | 
Cover
| Abstract | Visual relations, such as person ride bike and bike next to car, offer a comprehensive scene understanding of an image, and have already shown their great utility in connecting computer vision and natural language. However, due to the challenging combinatorial complexity of modeling subject-predicate-object relation triplets, very little work has been done to localize and predict visual relations. Inspired by the recent advances in relational representation learning of knowledge bases and convolutional object detection networks, we propose a Visual Translation Embedding network (VTransE) for visual relation detection. VTransE places objects in a low-dimensional relation space where a relation can be modeled as a simple vector translation, i.e., subject + predicate ≈ object. We propose a novel feature extraction layer that enables object-relation knowledge transfer in a fully-convolutional fashion that supports training and inference in a single forward/backward pass. To the best of our knowledge, VTransE is the first end-toend relation detection network. We demonstrate the effectiveness of VTransE over other state-of-the-art methods on two large-scale datasets: Visual Relationship and Visual Genome. Note that even though VTransE is a purely visual model, it is still competitive to the Lu's multi-modal model with language priors [27]. | 
    
|---|---|
| AbstractList | Visual relations, such as person ride bike and bike next to car, offer a comprehensive scene understanding of an image, and have already shown their great utility in connecting computer vision and natural language. However, due to the challenging combinatorial complexity of modeling subject-predicate-object relation triplets, very little work has been done to localize and predict visual relations. Inspired by the recent advances in relational representation learning of knowledge bases and convolutional object detection networks, we propose a Visual Translation Embedding network (VTransE) for visual relation detection. VTransE places objects in a low-dimensional relation space where a relation can be modeled as a simple vector translation, i.e., subject + predicate ≈ object. We propose a novel feature extraction layer that enables object-relation knowledge transfer in a fully-convolutional fashion that supports training and inference in a single forward/backward pass. To the best of our knowledge, VTransE is the first end-toend relation detection network. We demonstrate the effectiveness of VTransE over other state-of-the-art methods on two large-scale datasets: Visual Relationship and Visual Genome. Note that even though VTransE is a purely visual model, it is still competitive to the Lu's multi-modal model with language priors [27]. | 
    
| Author | Tat-Seng Chua Kyaw, Zawlin Shih-Fu Chang Hanwang Zhang  | 
    
| Author_xml | – sequence: 1 surname: Hanwang Zhang fullname: Hanwang Zhang email: hanwangzhang@gmail.com – sequence: 2 givenname: Zawlin surname: Kyaw fullname: Kyaw, Zawlin email: kzl.zawlin@gmail.com organization: Nat. Univ. of Singapore, Singapore, Singapore – sequence: 3 surname: Shih-Fu Chang fullname: Shih-Fu Chang email: sfchang@ee.columbia.edu – sequence: 4 surname: Tat-Seng Chua fullname: Tat-Seng Chua email: dcscts@nus.edu.sg organization: Nat. Univ. of Singapore, Singapore, Singapore  | 
    
| BookMark | eNpNjz1PwzAQhg0qEk1hZGLJH0i48zmOPUIoH1IFqCpdKzuxkSFNUBJU8e8JIgPL3TM87-neiM2atnGMXSCkiKCviu3LOuWAeUqERyzCjJQEkeXimM0RJCVSo57941MW9f07AKecw5zdbEP_Zep405mmr80Q2iZe7q2rqtC8xU9uOLTdR-zbLp7EtZusWze48pfO2Ik3de_Op71gr3fLTfGQrJ7vH4vrVRK4wCHRwlqSKDPJJXFemYxKS2Scc-Bl6ccJArzP1fgbKZNZ68XYyhgcHatowS7_7oYxsvvswt503zsFWisU9AMUhEz0 | 
    
| CODEN | IEEPAD | 
    
| ContentType | Conference Proceeding | 
    
| DBID | 6IE 6IH CBEJK RIE RIO  | 
    
| DOI | 10.1109/CVPR.2017.331 | 
    
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore Digital Library IEEE Proceedings Order Plans (POP) 1998-present  | 
    
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Applied Sciences Computer Science  | 
    
| EISBN | 1538604574 9781538604571  | 
    
| EISSN | 1063-6919 | 
    
| EndPage | 3115 | 
    
| ExternalDocumentID | 8099814 | 
    
| Genre | orig-research | 
    
| GroupedDBID | 23M 29F 29O 6IE 6IH 6IK ABDPE ACGFS ALMA_UNASSIGNED_HOLDINGS CBEJK IPLJI M43 RIE RIO RNS  | 
    
| ID | FETCH-LOGICAL-i241t-94bb36165626322da53cb33aeee0f6cfe0f040ff7802338a5bbf4331aa1aeeb83 | 
    
| IEDL.DBID | RIE | 
    
| ISSN | 1063-6919 | 
    
| IngestDate | Wed Aug 27 02:33:39 EDT 2025 | 
    
| IsPeerReviewed | false | 
    
| IsScholarly | true | 
    
| Language | English | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-i241t-94bb36165626322da53cb33aeee0f6cfe0f040ff7802338a5bbf4331aa1aeeb83 | 
    
| PageCount | 9 | 
    
| ParticipantIDs | ieee_primary_8099814 | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2017-07 | 
    
| PublicationDateYYYYMMDD | 2017-07-01 | 
    
| PublicationDate_xml | – month: 07 year: 2017 text: 2017-07  | 
    
| PublicationDecade | 2010 | 
    
| PublicationTitle | 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) | 
    
| PublicationTitleAbbrev | CVPR | 
    
| PublicationYear | 2017 | 
    
| Publisher | IEEE | 
    
| Publisher_xml | – name: IEEE | 
    
| SSID | ssj0023720 ssj0003211698  | 
    
| Score | 2.5582058 | 
    
| Snippet | Visual relations, such as person ride bike and bike next to car, offer a comprehensive scene understanding of an image, and have already shown their great... | 
    
| SourceID | ieee | 
    
| SourceType | Publisher | 
    
| StartPage | 3107 | 
    
| SubjectTerms | Computational modeling Feature extraction Knowledge based systems Knowledge transfer Object detection Visualization  | 
    
| Title | Visual Translation Embedding Network for Visual Relation Detection | 
    
| URI | https://ieeexplore.ieee.org/document/8099814 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5zJ09TN_E3OXi0nU3aNLk6N4awMcSN3UbSvMBQN3Htxb_eJP0xEQ9eSgmhpEke7yXve9-H0C2xO5dLHQUcrLnFTKaBMsQEBMB6A6HjBNxBcTJl43n8tEyWLXTX1MIAgAefQehefS5fb7PCXZX1uQ1nuFOtPkg5K2u1mvsUak8yTDQZBOLUV3ymk9GAiUjs-TX7g8Xs2YG60tBry_1QVfFOZdRBk3o4JZbkNSxyFWZfv5ga_zveI9Tbl-_hWeOYjlELNieoU8WbuLLmnW2qJR3qti56WKx3hXzD3oWVMDk8fFeg3ZfwtMSMYxvo4qpjDabDj5B7VNemh-aj4ctgHFQyC8Hauu88ELFSlDkWHsfdTrRMaKYolfZ37g3LjH1aSzcmdVxxlMtEKePqrKSMbB_F6Slqb7YbOEPYxlZRYhiTCSexASo0pVorlhIQGWHiHHXdLK0-SiaNVTVBF383X6JDt0olOPYKtfPPAq5tCJCrG7_23z_IrjA | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8MgFCfLPOjJj834LQePtrNQaLk6t0zdlsVsy24LFEgWtTOuvfjXC_Rjxnjw0jSENBR4eQ_e7_1-ANwgs3NjLgMvVsbcQsojT2ikPaSU8QZMhkTZg-JoTAez8GlBFg1wW9fCKKUc-Ez59tXl8uU6ye1VWSc24UxsVat3SBiGpKjWqm9UsDnLUFbnEJDVX3G5Too9ygK2ZdjsdOeTFwvrinynLvdDV8W5lf4-GFUDKtAkr36eCT_5-sXV-N8RH4D2toAPTmrXdAgaKj0C-2XECUt73pimStShamuB-_lqk_M36JxYAZSDvXehpP0SHBeocWhCXVh2rOB08EFlDteVtsGs35t2B14ptOCtjAPPPBYKganl4bHs7UhyghOBMTe_c6dpos3T2LrWkWWLwzEnQmhbacV5YPqIGB-DZrpO1QmAJroKiKaUkxiFWmEmMZZS0AgpliDKTkHLztLyo-DSWJYTdPZ38zXYHUxHw-Xwcfx8DvbsihVQ2QvQzD5zdWkCgkxcuX3wDZAusX0 | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2017+IEEE+Conference+on+Computer+Vision+and+Pattern+Recognition+%28CVPR%29&rft.atitle=Visual+Translation+Embedding+Network+for+Visual+Relation+Detection&rft.au=Hanwang+Zhang&rft.au=Kyaw%2C+Zawlin&rft.au=Shih-Fu+Chang&rft.au=Tat-Seng+Chua&rft.date=2017-07-01&rft.pub=IEEE&rft.issn=1063-6919&rft.eissn=1063-6919&rft.spage=3107&rft.epage=3115&rft_id=info:doi/10.1109%2FCVPR.2017.331&rft.externalDocID=8099814 | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6919&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6919&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6919&client=summon |