Image Captioning using Google's Inception-resnet-v2 and Recurrent Neural Network

Given a photograph as input, this paper solves the problem of experiencing a plausible caption of the photograph. The model learns about the correlations between language and images from the provided data-set of labeled images. It proposes a fully automatic approach through a combination of Convolut...

Full description

Saved in:
Bibliographic Details
Published inInternational Conference on Contemporary Computing pp. 1 - 6
Main Authors Bhatia, Yajurv, Bajpayee, Aman, Raghuvanshi, Deepanshu, Mittal, Himanshu
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.08.2019
Subjects
Online AccessGet full text
ISSN2572-6129
DOI10.1109/IC3.2019.8844921

Cover

More Information
Summary:Given a photograph as input, this paper solves the problem of experiencing a plausible caption of the photograph. The model learns about the correlations between language and images from the provided data-set of labeled images. It proposes a fully automatic approach through a combination of Convolutional Neural Network and a Recurrent Neural Network. The encoder is responsible for understanding the features present in the inputted image that are useful in eventually producing an explanation. The model attempts at producing captions for both the objects and the regions present in the image. Treating language as a big label space, the project generates predictions for the various regions of the image and then stitches them together.
ISSN:2572-6129
DOI:10.1109/IC3.2019.8844921