擬似教師ありキャプション生成における部分的不一致の除去

Saved in:

Bibliographic Details
Published in	人工知能学会論文誌 Vol. 37; no. 2; pp. H-L82_1 - 12
Main Authors	本多, 右京, 渡辺, 太郎, 松本, 裕治, 橋本, 敦史
Format	Journal Article
Language	Japanese
Published	一般社団法人人工知能学会 01.03.2022
Subjects	image captioning pseudo-label unsupervised learning vision and language
Online Access	Get full text
ISSN	1346-0714 1346-8030
DOI	10.1527/tjsai.37-2_H-L82

Cover

Author	松本, 裕治渡辺, 太郎本多, 右京橋本, 敦史
Author_xml	– sequence: 1 fullname: 本多, 右京 organization: 理化学研究所AIP – sequence: 1 fullname: 渡辺, 太郎 organization: 奈良先端科学技術大学院大学 – sequence: 1 fullname: 松本, 裕治 organization: 理化学研究所AIP – sequence: 1 fullname: 橋本, 敦史 organization: オムロンサイニックエックス株式会社
BookMark	eNo9kEtLw0AAhBepYK29-ydS95Xs9ijFWqEggp6XTbLVhFol6cWbSURLi6KoSL14UFDwcVGwYP_N9mH_hbUWLzNz-BiGmQep2l5NAbCIYA6ZmC3V_VB6OcIMLEpGmeMZkEaEWgaHBKamGTJE50A2DD0bQoQJRdBMg43B5Uuv2x1ct_udZx3FOm7q-FUn9zq50fGnTh518j68uhs0znU0Bpo6utBxa5Q89RvHw9ujXue01zn8PvnQ0duo_dA_-1oAsxVZDVV26hmwVVzZLIx3ra-uFZbLho8xR4bFCWQmY45E3ERmniKKOXctaXHXVkS5eUmlTVyGIFEMYQYRdyrUcZBtutiySQYU_3r9sC63ldgPvF0ZHAgZ1D2nqsTkE0GYwL8y8ZIYf_MPODsyEL4kP1XLe6I
ContentType	Journal Article
Copyright	人工知能学会2022
Copyright_xml	– notice: 人工知能学会2022
DOI	10.1527/tjsai.37-2_H-L82
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1346-8030
EndPage	12
ExternalDocumentID	article_tjsai_37_2_37_37_2_H_L82_article_char_ja
GroupedDBID	123 2WC ABJNI ACGFS ALMA_UNASSIGNED_HOLDINGS CS3 E3Z EBS EJD JSF KQ8 OK1 PQQKQ RJT XSB
ID	FETCH-LOGICAL-j2281-68307577ca185159414288d6a68dbe3ed9a4ab3d7103e7127018cf4cc1b5d26b3
ISSN	1346-0714
IngestDate	Wed Sep 03 06:30:45 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	true
Issue	2
Language	Japanese
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-j2281-68307577ca185159414288d6a68dbe3ed9a4ab3d7103e7127018cf4cc1b5d26b3
OpenAccessLink	https://www.jstage.jst.go.jp/article/tjsai/37/2/37_37-2_H-L82/_article/-char/ja
ParticipantIDs	jstage_primary_article_tjsai_37_2_37_37_2_H_L82_article_char_ja
PublicationCentury	2000
PublicationDate	2022/03/01
PublicationDateYYYYMMDD	2022-03-01
PublicationDate_xml	– month: 03 year: 2022 text: 2022/03/01 day: 01
PublicationDecade	2020
PublicationTitle	人工知能学会論文誌
PublicationYear	2022
Publisher	一般社団法人人工知能学会
Publisher_xml	– name: 一般社団法人人工知能学会
References	[Anderson 18b] Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., and Zhang, L.: Bottom-up and top-down attention for image captioning and visual question answering, in CVPR, pp. 6077–6086 (2018) [He 16b] He, K., Zhang, X., Ren, S., and Sun, J.: Identity mappings in deep residual networks, in ECCV, pp. 630–645 (2016) [Papineni 02] Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J.: BLEU: A method for automatic evaluation of machine translation, in ACL, pp. 311–318 (2002) [Agrawal 19] Agrawal, H., Desai, K., Wang, Y., Chen, X., Jain, R., Johnson, M., Batra, D., Parikh, D., Lee, S., and Anderson, P.: nocaps: novel object captioning at scale, in ICCV, pp. 8948–8957 (2019) [Lu 17] Lu, J., Xiong, C., Parikh, D., and Socher, R.: Knowing when to look: Adaptive attention via a visual sentinel for image captioning, in CVPR, pp. 375–383 (2017) [Fisch 20] Fisch, A., Lee, K., Chang, M.-W., Clark, J. H., and Barzilay, R.: Capwap: Captioning with a purpose, in EMNLP, pp. 8755–8768 (2020) [Szegedy 17] Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A.: Inception-v4, inception-resnet and the impact of residual connections on learning, in AAAI, pp. 4278–4284 (2017) [Choi 20] Choi, B.-J., Hong, J., Park, D., and Lee, S. W.: Fˆ2-softmax: Diversifying neural text generation via frequency factorized softmax, in EMNLP, pp. 9167–9182 (2020) [Xu 15] Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., and Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention, in ICML, pp. 2048–2057 (2015) [He 16a] He, K., Zhang, X., Ren, S., and Sun, J.: Deep residual learn-ing for image recognition, in CVPR, pp. 770–778 (2016) [Russakovsky 15] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L.: Imagenet large scale visual recognition challenge, IJCV, Vol. 115, No. 3, pp. 211–252 (2015) [Laina 19] Laina, I., Rupprecht, C., and Navab, N.: Towards unsupervised image captioning with shared multimodal embeddings, in ICCV, pp. 7414–7424 (2019) [Cho 14] Cho, K., Merrie ̈nboer, van B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y.: Learning phrase representations using RNN encoder–decoder for statistical machine translation, in EMNLP, pp. 1724–1734 (2014) [Chen 15] Chen, X., Fang, H., Lin, T.-Y., Vedantam, R., Gupta, S., Dolla ́r, P., and Zitnick, C. L.: Microsoft coco captions: Data collection and evaluation server, arXiv preprint arXiv:1504.00325 (2015) [Goodfellow14] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y.: Generative adversarial nets, in NeurIPS, Vol. 27, pp. 2672–2680 (2014) [Lin 04] Lin, C.-Y.: ROUGE: A package for automatic evaluation of summaries, in Text Summarization Branches Out, pp. 74–81 (2004) [Anderson 18a] Anderson, P., Gould, S., and Johnson, M.: Partially-supervised image captioning, in NeurIPS, Vol. 31, pp. 1875–1886 (2018) [Liu 19] Liu, F., Gao, M., Zhang, T., and Zou, Y.: Exploring semantic relationships for image captioning without parallel data, in ICDM, pp. 439–448 (2019) [Anderson16] Anderson, P., Fernando, B., Johnson, M., and Gould, S.: SPICE: Semantic propositional image caption evaluation, in ECCV, pp. 382–398 (2016) [Gu 19] Gu, J., Joty, S., Cai, J., Zhao, H., Yang, X., and Wang, G.: Unpaired image captioning via scene graph alignments, in ICCV, pp. 10323–10332 (2019) [Kingma 15] Kingma, D. P. and Ba, J.: Adam: A method for stochastic optimization, in ICLR (2015) [Honda 21] Honda, U., Ushiku, Y., Hashimoto, A., Watanabe, T., and Matsumoto, Y.: Removing word-level spurious alignment between images and pseudo-captions in unsupervised image captioning, in EACL, pp. 3692–3702 (2021) [Zhang 20] Zhang, R., Chen, C., Zhang, X., Bai, K., and Carin, L.: Semantic matching for sequence-to-sequence learning, in Findings of ACL: EMNLP 2020, pp. 212–222 (2020) [Ren 15] Ren, S., He, K., Girshick, R., and Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks, in NeurIPS, Vol. 28, pp. 91–99 (2015) [Gurari 20] Gurari, D., Zhao, Y., Zhang, M., and Bhattacharya, N.: Captioning images taken by people who are blind, in ECCV, pp. 417–434 (2020) [Hendricks 16] Hendricks, L. A., Venugopalan, S., Rohrbach, M., Mooney, R., Saenko, K., and Darrell, T.: Deep compositional captioning: Describing novel object categories without paired training data, in CVPR, pp. 1–10 (2016) [Denkowski 14] Denkowski, M. and Lavie, A.: METEOR universal: Language specific translation evaluation for any target language, in Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 376–380 (2014) [Lin 14] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ra-manan, D., Dolla ́r, P., and Zitnick, C. L.: Microsoft COCO: Common objects in context, in ECCV, pp. 740–755 (2014) [Feng 19] Feng, Y., Ma, L., Liu, W., and Luo, J.: Unsupervised image captioning, in CVPR, pp. 4125–4134 (2019) [Sharma 18] Sharma, P., Ding, N., Goodman, S., and Soricut, R.: Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning, in ACL, pp. 2556–2565 (2018) [Karpathy 15] Karpathy, A. and Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions, in CVPR, pp. 3128–3137 (2015) [Guo 20] Guo, D., Wang, Y., Song, P., and Wang, M.: Recurrent relational memory network for unsupervised image captioning, in IJCAI, pp. 920–926 (2020) [Demeter 20] Demeter, D., Kimmel, G., and Downey, D.: Stolen probability: A structural weakness of neural language models, in ACL, pp. 2191–2197 (2020) [Huang 17] Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al.: Speed/accuracy trade-offs for modern convolutional object detectors, in CVPR, pp. 7310–7311 (2017) [Vedantam 15] Vedantam, R., Lawrence Zitnick, C., and Parikh, D.: CIDEr: Consensus-based image description evaluation, in CVPR, pp. 4566–4575 (2015) [Liu 18] Liu, X., Li, H., Shao, J., Chen, D., and Wang, X.: Show, tell and discriminate: Image captioning by self-retrieval with partially labeled data, in ECCV, pp. 338–354 (2018) [Venugopalan 17] Venugopalan, S., Anne Hendricks, L., Rohrbach, M., Mooney, R., Darrell, T., and Saenko, K.: Captioning images with diverse objects, in CVPR, pp. 5753–5761 (2017) [Cao 20] Cao, S., An, G., Zheng, Z., and Ruan, Q.: Interactions guided generative adversarial network for unsupervised image captioning, Neurocomputing, Vol. 417, pp. 419–431 (2020) [Kim 20] Kim, H., Tang, Z., and Bansal, M.: Dense-caption matching and frame-selection gating for temporal localization in videoqa, in ACL, pp. 4812–4822 (2020) [Nikolaus 19] Nikolaus, M., Abdou, M., Lamm, M., Aralikatte, R., and Elliott, D.: Compositional generalization in image captioning, in CoNLL, pp. 87–98 (2019) [Wang 20] Wang, J., Xu, W., Wang, Q., and Chan, A. B.: Compare and reweight: Distinctive image captioning using similar images sets, in ECCV, pp. 370–386 (2020) [Kuznetsova 20] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A., Duerig, T., and Ferrari, V.: The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale, IJCV, Vol. 128, No. 7, pp. 1956–1981 (2020) [Vinyals 15] Vinyals, O., Toshev, A., Bengio, S., and Erhan, D.: Show and tell: A neural image caption generator, in CVPR, pp. 3156–3164 (2015) [Gu 18] Gu, J., Joty, S., Cai, J., and Wang, G.: Unpaired image captioning by language pivoting, in ECCV, pp. 503–519 (2018) [Kim 19] Kim, D.-J., Choi, J., Oh, T.-H., and Kweon, I. S.: Image captioning with very scarce supervised data: Adversarial semi-supervised learning approach, in EMNLP-IJCNLP, pp. 2012–2023 (2019) [Song 19] Song, Y., Chen, S., Zhao, Y., and Jin, Q.: Unpaired cross-lingual image caption generation with self-supervised rewards, in ACMMM, pp. 784–792 (2019) [Hochreiter 97] Hochreiter, S. and Schmidhuber, J.: Long short-term memory, Neural Computation, Vol. 9, No. 8, pp. 1735–1780 (1997) [Krishna 17] Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D. A., Bernstein, M., and Fei-Fei, L.: Visual genome: Connecting language and vision using crowdsourced dense image annotations, IJCV, Vol. 123, No. 1, pp. 32–73 (2017) [Young 14] Young, P., Lai, A., Hodosh, M., and Hockenmaier, J.: From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions, TACL, Vol. 2, pp. 67–78 (2014) [Welleck 20] Welleck, S., Kulikov, I., Roller, S., Dinan, E., Cho, K., and Weston, J.: Neural text generation with unlikelihood training, in ICLR (2020) [Holtzman 20] Holtzman, A., Buys, J., Du, L., Forbes, M., and Choi, Y.: The curious case of neural text degeneration, in ICLR (2020) [Krasin 17] Krasin, I., Duerig, T., Alldrin, N., Ferrari, V., Abu-El-Haija, S., Kuznetsova, A., Rom, H., Uijlings, J., Popov, S., Veit, A., Belongie, S., Gomes, V., Gupta, A., Sun, C., Chechik, G., Cai, D., Feng, Z., Narayanan, D., and Murphy, K.: Openimages: A public dataset for large-scale multi-label and multi-class image classification., Dataset available from https://github.com/openimages (2017) [Lu 18] Lu, J., Yang, J., Batra, D., and Parikh, D.: Neural baby talk, in CVPR, pp. 7219–7228 (2018)
References_xml	– reference: [Kingma 15] Kingma, D. P. and Ba, J.: Adam: A method for stochastic optimization, in ICLR (2015) – reference: [Krishna 17] Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D. A., Bernstein, M., and Fei-Fei, L.: Visual genome: Connecting language and vision using crowdsourced dense image annotations, IJCV, Vol. 123, No. 1, pp. 32–73 (2017) – reference: [Liu 19] Liu, F., Gao, M., Zhang, T., and Zou, Y.: Exploring semantic relationships for image captioning without parallel data, in ICDM, pp. 439–448 (2019) – reference: [Lu 17] Lu, J., Xiong, C., Parikh, D., and Socher, R.: Knowing when to look: Adaptive attention via a visual sentinel for image captioning, in CVPR, pp. 375–383 (2017) – reference: [Choi 20] Choi, B.-J., Hong, J., Park, D., and Lee, S. W.: Fˆ2-softmax: Diversifying neural text generation via frequency factorized softmax, in EMNLP, pp. 9167–9182 (2020) – reference: [Cao 20] Cao, S., An, G., Zheng, Z., and Ruan, Q.: Interactions guided generative adversarial network for unsupervised image captioning, Neurocomputing, Vol. 417, pp. 419–431 (2020) – reference: [Zhang 20] Zhang, R., Chen, C., Zhang, X., Bai, K., and Carin, L.: Semantic matching for sequence-to-sequence learning, in Findings of ACL: EMNLP 2020, pp. 212–222 (2020) – reference: [Fisch 20] Fisch, A., Lee, K., Chang, M.-W., Clark, J. H., and Barzilay, R.: Capwap: Captioning with a purpose, in EMNLP, pp. 8755–8768 (2020) – reference: [Karpathy 15] Karpathy, A. and Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions, in CVPR, pp. 3128–3137 (2015) – reference: [Gurari 20] Gurari, D., Zhao, Y., Zhang, M., and Bhattacharya, N.: Captioning images taken by people who are blind, in ECCV, pp. 417–434 (2020) – reference: [Lu 18] Lu, J., Yang, J., Batra, D., and Parikh, D.: Neural baby talk, in CVPR, pp. 7219–7228 (2018) – reference: [Kim 20] Kim, H., Tang, Z., and Bansal, M.: Dense-caption matching and frame-selection gating for temporal localization in videoqa, in ACL, pp. 4812–4822 (2020) – reference: [Lin 14] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ra-manan, D., Dolla ́r, P., and Zitnick, C. L.: Microsoft COCO: Common objects in context, in ECCV, pp. 740–755 (2014) – reference: [Song 19] Song, Y., Chen, S., Zhao, Y., and Jin, Q.: Unpaired cross-lingual image caption generation with self-supervised rewards, in ACMMM, pp. 784–792 (2019) – reference: [Kuznetsova 20] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A., Duerig, T., and Ferrari, V.: The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale, IJCV, Vol. 128, No. 7, pp. 1956–1981 (2020) – reference: [Huang 17] Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al.: Speed/accuracy trade-offs for modern convolutional object detectors, in CVPR, pp. 7310–7311 (2017) – reference: [Vinyals 15] Vinyals, O., Toshev, A., Bengio, S., and Erhan, D.: Show and tell: A neural image caption generator, in CVPR, pp. 3156–3164 (2015) – reference: [Young 14] Young, P., Lai, A., Hodosh, M., and Hockenmaier, J.: From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions, TACL, Vol. 2, pp. 67–78 (2014) – reference: [Sharma 18] Sharma, P., Ding, N., Goodman, S., and Soricut, R.: Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning, in ACL, pp. 2556–2565 (2018) – reference: [Agrawal 19] Agrawal, H., Desai, K., Wang, Y., Chen, X., Jain, R., Johnson, M., Batra, D., Parikh, D., Lee, S., and Anderson, P.: nocaps: novel object captioning at scale, in ICCV, pp. 8948–8957 (2019) – reference: [Goodfellow14] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y.: Generative adversarial nets, in NeurIPS, Vol. 27, pp. 2672–2680 (2014) – reference: [Russakovsky 15] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L.: Imagenet large scale visual recognition challenge, IJCV, Vol. 115, No. 3, pp. 211–252 (2015) – reference: [Wang 20] Wang, J., Xu, W., Wang, Q., and Chan, A. B.: Compare and reweight: Distinctive image captioning using similar images sets, in ECCV, pp. 370–386 (2020) – reference: [Lin 04] Lin, C.-Y.: ROUGE: A package for automatic evaluation of summaries, in Text Summarization Branches Out, pp. 74–81 (2004) – reference: [Venugopalan 17] Venugopalan, S., Anne Hendricks, L., Rohrbach, M., Mooney, R., Darrell, T., and Saenko, K.: Captioning images with diverse objects, in CVPR, pp. 5753–5761 (2017) – reference: [Laina 19] Laina, I., Rupprecht, C., and Navab, N.: Towards unsupervised image captioning with shared multimodal embeddings, in ICCV, pp. 7414–7424 (2019) – reference: [Hochreiter 97] Hochreiter, S. and Schmidhuber, J.: Long short-term memory, Neural Computation, Vol. 9, No. 8, pp. 1735–1780 (1997) – reference: [Denkowski 14] Denkowski, M. and Lavie, A.: METEOR universal: Language specific translation evaluation for any target language, in Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 376–380 (2014) – reference: [Holtzman 20] Holtzman, A., Buys, J., Du, L., Forbes, M., and Choi, Y.: The curious case of neural text degeneration, in ICLR (2020) – reference: [Anderson16] Anderson, P., Fernando, B., Johnson, M., and Gould, S.: SPICE: Semantic propositional image caption evaluation, in ECCV, pp. 382–398 (2016) – reference: [Krasin 17] Krasin, I., Duerig, T., Alldrin, N., Ferrari, V., Abu-El-Haija, S., Kuznetsova, A., Rom, H., Uijlings, J., Popov, S., Veit, A., Belongie, S., Gomes, V., Gupta, A., Sun, C., Chechik, G., Cai, D., Feng, Z., Narayanan, D., and Murphy, K.: Openimages: A public dataset for large-scale multi-label and multi-class image classification., Dataset available from https://github.com/openimages (2017) – reference: [Guo 20] Guo, D., Wang, Y., Song, P., and Wang, M.: Recurrent relational memory network for unsupervised image captioning, in IJCAI, pp. 920–926 (2020) – reference: [Kim 19] Kim, D.-J., Choi, J., Oh, T.-H., and Kweon, I. S.: Image captioning with very scarce supervised data: Adversarial semi-supervised learning approach, in EMNLP-IJCNLP, pp. 2012–2023 (2019) – reference: [Vedantam 15] Vedantam, R., Lawrence Zitnick, C., and Parikh, D.: CIDEr: Consensus-based image description evaluation, in CVPR, pp. 4566–4575 (2015) – reference: [Papineni 02] Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J.: BLEU: A method for automatic evaluation of machine translation, in ACL, pp. 311–318 (2002) – reference: [Szegedy 17] Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A.: Inception-v4, inception-resnet and the impact of residual connections on learning, in AAAI, pp. 4278–4284 (2017) – reference: [Honda 21] Honda, U., Ushiku, Y., Hashimoto, A., Watanabe, T., and Matsumoto, Y.: Removing word-level spurious alignment between images and pseudo-captions in unsupervised image captioning, in EACL, pp. 3692–3702 (2021) – reference: [Nikolaus 19] Nikolaus, M., Abdou, M., Lamm, M., Aralikatte, R., and Elliott, D.: Compositional generalization in image captioning, in CoNLL, pp. 87–98 (2019) – reference: [Gu 19] Gu, J., Joty, S., Cai, J., Zhao, H., Yang, X., and Wang, G.: Unpaired image captioning via scene graph alignments, in ICCV, pp. 10323–10332 (2019) – reference: [He 16b] He, K., Zhang, X., Ren, S., and Sun, J.: Identity mappings in deep residual networks, in ECCV, pp. 630–645 (2016) – reference: [Demeter 20] Demeter, D., Kimmel, G., and Downey, D.: Stolen probability: A structural weakness of neural language models, in ACL, pp. 2191–2197 (2020) – reference: [Hendricks 16] Hendricks, L. A., Venugopalan, S., Rohrbach, M., Mooney, R., Saenko, K., and Darrell, T.: Deep compositional captioning: Describing novel object categories without paired training data, in CVPR, pp. 1–10 (2016) – reference: [Cho 14] Cho, K., Merrie ̈nboer, van B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y.: Learning phrase representations using RNN encoder–decoder for statistical machine translation, in EMNLP, pp. 1724–1734 (2014) – reference: [Chen 15] Chen, X., Fang, H., Lin, T.-Y., Vedantam, R., Gupta, S., Dolla ́r, P., and Zitnick, C. L.: Microsoft coco captions: Data collection and evaluation server, arXiv preprint arXiv:1504.00325 (2015) – reference: [Anderson 18b] Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., and Zhang, L.: Bottom-up and top-down attention for image captioning and visual question answering, in CVPR, pp. 6077–6086 (2018) – reference: [Feng 19] Feng, Y., Ma, L., Liu, W., and Luo, J.: Unsupervised image captioning, in CVPR, pp. 4125–4134 (2019) – reference: [Gu 18] Gu, J., Joty, S., Cai, J., and Wang, G.: Unpaired image captioning by language pivoting, in ECCV, pp. 503–519 (2018) – reference: [He 16a] He, K., Zhang, X., Ren, S., and Sun, J.: Deep residual learn-ing for image recognition, in CVPR, pp. 770–778 (2016) – reference: [Liu 18] Liu, X., Li, H., Shao, J., Chen, D., and Wang, X.: Show, tell and discriminate: Image captioning by self-retrieval with partially labeled data, in ECCV, pp. 338–354 (2018) – reference: [Ren 15] Ren, S., He, K., Girshick, R., and Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks, in NeurIPS, Vol. 28, pp. 91–99 (2015) – reference: [Xu 15] Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., and Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention, in ICML, pp. 2048–2057 (2015) – reference: [Anderson 18a] Anderson, P., Gould, S., and Johnson, M.: Partially-supervised image captioning, in NeurIPS, Vol. 31, pp. 1875–1886 (2018) – reference: [Welleck 20] Welleck, S., Kulikov, I., Roller, S., Dinan, E., Cho, K., and Weston, J.: Neural text generation with unlikelihood training, in ICLR (2020)
SSID	ssib001234105 ssib008501343 ssib047348305 ssib000961560 ssj0057238 ssib006575950
Score	2.3363843
SourceID	jstage
SourceType	Publisher
StartPage	H-L82_1
SubjectTerms	image captioning pseudo-label unsupervised learning vision and language
Title	擬似教師ありキャプション生成における部分的不一致の除去
URI	https://www.jstage.jst.go.jp/article/tjsai/37/2/37_37-2_H-L82/_article/-char/ja
Volume	37
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
ispartofPNX	人工知能学会論文誌, 2022/03/01, Vol.37(2), pp.H-L82_1-12
journalDatabaseRights	– providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1346-8030 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0057238 issn: 1346-0714 databaseCode: KQ8 dateStart: 20010101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpR3JahRRsAnx4sVd3MnBd5KJ6e0tJ-me6SG4QSCB3JreBpxDFJ1cPDkzoiGiKCoSLx4UFFwuCg6Yv-ks5i-sqn4906MeTISmp3ivXq3d86q6q98zjPOJzZNWqrJalgl8WmXGNWllrVokkyR1RZSoDBPFa9f57IJzedFdnJgUlaql5U48ndz963cle_EqtIFf8SvZXXh2SBQaAAb_whk8DOd_8jELOFM28-oscJhfxwNbXKYUC1zmS-b5LLAhWmTSIsBi0tOA1yAAhtsaUEJ3-aLsKgEfcARTDlNNZCElUzOa8oiFpwFllrygSxEdifLAKMmJjsekQzJDS6MEgCCcBfOdknKAw0EXz6HhAfP9ajBNAz08UFnBPJeINwmQJHYDu0BTj2sTAeuAzKI42YojR2wBkYYPjKmnTnZ1kTkMgiuRZGiSKYivN4aPxjaRkh9gp8aHwUC5MIIMxug3EHPIiPAl-gLch-SsUlmN7yky5xg--RpVKwSzqo9wIPsf1rDRTTduZknaCRQQxYBLxqcLgaN-KII27YW9Gbky19kOp-_XilBg1CZn9KsyPUEWq_LoPwKrMtvN1q5KKzQrwVNREv_HtOxaWBjQad-JbkzbomaFNHIUggwLQ_WtFBJqaIvQwhP9zobIrETAbxLDNiRG-yzBOW5dcmWuklcoblbzfgjPsJx5NNHgtrSV1_nShUxotC6dg0s-0URYhHQu7sxHT260xXS9A6h18XelIJJtQ15X1oRSmDp_yDig88spr1DgsDHRjo4YB8u9W6b0VH7UmNt6_mljfX3r5drm4GPe7eW91bz3Oe-_zfuv8t73vP8-73_dfvFma-Vp3gWE1bz7LO892ul_2Fx5sP36_sbg8cbg3s-H3_Lul521d5tPfhwzFprBfB3EK7ZXqbUtS5o1DioKV4gkgpgdshoHF1-UKY-4TOPMzlIVOVFsp5CD2JnAChVTJi0nSczYTS0e28eNyaWbS9kJYyqVLZXNpKkjEggKXEdB0pY4cKS4B1YrPmlcKowS3irW0Al36-hT_03htLF_dO-dMSY7t5ezs5BQdOJzdPH8Ahhv6HI
linkProvider	Colorado Alliance of Research Libraries
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E6%93%AC%E4%BC%BC%E6%95%99%E5%B8%AB%E3%81%82%E3%82%8A%E3%82%AD%E3%83%A3%E3%83%97%E3%82%B7%E3%83%A7%E3%83%B3%E7%94%9F%E6%88%90%E3%81%AB%E3%81%8A%E3%81%91%E3%82%8B%E9%83%A8%E5%88%86%E7%9A%84%E4%B8%8D%E4%B8%80%E8%87%B4%E3%81%AE%E9%99%A4%E5%8E%BB&rft.jtitle=%E4%BA%BA%E5%B7%A5%E7%9F%A5%E8%83%BD%E5%AD%A6%E4%BC%9A%E8%AB%96%E6%96%87%E8%AA%8C&rft.au=%E6%9C%AC%E5%A4%9A%2C+%E5%8F%B3%E4%BA%AC&rft.au=%E6%B8%A1%E8%BE%BA%2C+%E5%A4%AA%E9%83%8E&rft.au=%E6%9D%BE%E6%9C%AC%2C+%E8%A3%95%E6%B2%BB&rft.au=%E6%A9%8B%E6%9C%AC%2C+%E6%95%A6%E5%8F%B2&rft.date=2022-03-01&rft.pub=%E4%B8%80%E8%88%AC%E7%A4%BE%E5%9B%A3%E6%B3%95%E4%BA%BA+%E4%BA%BA%E5%B7%A5%E7%9F%A5%E8%83%BD%E5%AD%A6%E4%BC%9A&rft.issn=1346-0714&rft.eissn=1346-8030&rft.volume=37&rft.issue=2&rft.spage=H-L82_1&rft.epage=12&rft_id=info:doi/10.1527%2Ftjsai.37-2_H-L82&rft.externalDocID=article_tjsai_37_2_37_37_2_H_L82_article_char_ja
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1346-0714&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1346-0714&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1346-0714&client=summon