Image Captioning using Google's Inception-resnet-v2 and Recurrent Neural Network

Given a photograph as input, this paper solves the problem of experiencing a plausible caption of the photograph. The model learns about the correlations between language and images from the provided data-set of labeled images. It proposes a fully automatic approach through a combination of Convolut...

Full description

Saved in:

Bibliographic Details
Published in	International Conference on Contemporary Computing pp. 1 - 6
Main Authors	Bhatia, Yajurv, Bajpayee, Aman, Raghuvanshi, Deepanshu, Mittal, Himanshu
Format	Conference Proceeding
Language	English
Published	IEEE 01.08.2019
Subjects	CNN (Convolutional Neural Network) Computer architecture Google Image representation Information technology Logic gates Recurrent neural networks RNN(Recurrent Neural Network)
Online Access	Get full text
ISSN	2572-6129
DOI	10.1109/IC3.2019.8844921

Cover

More Information
Summary:	Given a photograph as input, this paper solves the problem of experiencing a plausible caption of the photograph. The model learns about the correlations between language and images from the provided data-set of labeled images. It proposes a fully automatic approach through a combination of Convolutional Neural Network and a Recurrent Neural Network. The encoder is responsible for understanding the features present in the inputted image that are useful in eventually producing an explanation. The model attempts at producing captions for both the objects and the regions present in the image. Treating language as a big label space, the project generates predictions for the various regions of the image and then stitches them together.
ISSN:	2572-6129
DOI:	10.1109/IC3.2019.8844921