Context-aware CLIP for Enhanced Food Recognition

Generalization of food image recognition frameworks is difficult due to the wide variety of food categories in cuisines across cultures. The performance of the deep neural network models highly depends on the training dataset. To overcome this problem, we propose to extract context information from...

Full description

Saved in:
Bibliographic Details
Published inAdvances in artificial intelligence research : (Online) Vol. 5; no. 1; pp. 7 - 13
Main Author Öztürk Ergün, Övgü
Format Journal Article
LanguageEnglish
Published 16.06.2025
Online AccessGet full text
ISSN2757-7422
2757-7422
DOI10.54569/aair.1707867

Cover

Abstract Generalization of food image recognition frameworks is difficult due to the wide variety of food categories in cuisines across cultures. The performance of the deep neural network models highly depends on the training dataset. To overcome this problem, we propose to extract context information from images in order to increase the discrimination capacity of networks. In this work, we utilize the CLIP architecture with the automatically derived ingredient context from food images. A list of ingredients are associated with each food category, which is later modeled as text after a voting process and fed to a CLIP architecture together with input image. Experimental results on the Food101 dataset show that this approach significantly improves the model’s performance, achieving a 2% overall increase in accuracy. This improvement varies across food classes, with increases ranging from 0.5% to as much as 22%. The proposed framework, CLIP fed with ingredient text, outperforms Yolov8 (81.46%) with 81.80% top 1 overall accuracy over 101 classes.
AbstractList Generalization of food image recognition frameworks is difficult due to the wide variety of food categories in cuisines across cultures. The performance of the deep neural network models highly depends on the training dataset. To overcome this problem, we propose to extract context information from images in order to increase the discrimination capacity of networks. In this work, we utilize the CLIP architecture with the automatically derived ingredient context from food images. A list of ingredients are associated with each food category, which is later modeled as text after a voting process and fed to a CLIP architecture together with input image. Experimental results on the Food101 dataset show that this approach significantly improves the model’s performance, achieving a 2% overall increase in accuracy. This improvement varies across food classes, with increases ranging from 0.5% to as much as 22%. The proposed framework, CLIP fed with ingredient text, outperforms Yolov8 (81.46%) with 81.80% top 1 overall accuracy over 101 classes.
Author Öztürk Ergün, Övgü
Author_xml – sequence: 1
  givenname: Övgü
  orcidid: 0009-0007-6273-4877
  surname: Öztürk Ergün
  fullname: Öztürk Ergün, Övgü
BookMark eNqFjz1PwzAYhC1UJErpyJ4_4ODP2BlR1EKlSFRV9-it_QaCgl05QaX_nkI7sDHdDXene27JJMSAhNxzlmuli_IBoEs5N8zYwlyRqTDaUKOEmPzxN2Q-DO-MMVEKVko2JayKYcSvkcIBEmZVvVpnbUzZIrxBcOizZYw-26CLr6EbuxjuyHUL_YDzi87IdrnYVs-0fnlaVY81dcYY6jQaBVZ4awuLp1OoBJfKtc6DLRWgx0Jy7ne-kEZrJplqBXrH5c4JpaSckfw8-xn2cDxA3zf71H1AOjacNb_EzQ9xcyE-Fei54FIchoTtP_lvoTlYww
Cites_doi 10.2139/ssrn.4984843
10.1109/SIU.2018.8404617
10.3390/s24072034
10.1007/978-3-319-39601-9_4
10.1109/CVPR52729.2023.00271
10.1016/j.neucom.2020.07.018
10.1016/j.compbiomed.2022.105645
10.1109/ICPRS58416.2023.10179037
10.1016/j.inffus.2023.101859
10.1007/978-3-030-68821-9_47
10.1109/CVPRW63382.2024.00373
10.3390/s23136137
10.1109/CVPRW63382.2024.00439
10.1145/3343031.3350948
10.1109/CVPRW63382.2024.00379
10.1007/978-3-319-10599-4_29
10.1145/2964284.2964315
10.1145/3627377.3627442
10.5220/0012388200003660
10.1109/WACV48630.2021.00175
10.1145/3552485.3554939
10.1109/CVPR52688.2022.01593
10.1145/3391624
10.1109/CVPRW63382.2024.00375
ContentType Journal Article
DBID AAYXX
CITATION
ADTOC
UNPAY
DOI 10.54569/aair.1707867
DatabaseName CrossRef
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
Database_xml – sequence: 1
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2757-7422
EndPage 13
ExternalDocumentID 10.54569/aair.1707867
10_54569_aair_1707867
GroupedDBID AAYXX
ALMA_UNASSIGNED_HOLDINGS
CITATION
GROUPED_DOAJ
M~E
ADTOC
UNPAY
ID FETCH-LOGICAL-c777-c5e74a82d8868e707e42134cfcda894aede6311dbd637550304f2edc13bc24433
IEDL.DBID UNPAY
ISSN 2757-7422
IngestDate Tue Aug 19 23:34:12 EDT 2025
Wed Oct 01 05:59:14 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License cc-by-nc
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c777-c5e74a82d8868e707e42134cfcda894aede6311dbd637550304f2edc13bc24433
ORCID 0009-0007-6273-4877
OpenAccessLink https://proxy.k.utb.cz/login?url=https://dergipark.org.tr/en/download/article-file/4908717
PageCount 7
ParticipantIDs unpaywall_primary_10_54569_aair_1707867
crossref_primary_10_54569_aair_1707867
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2025-06-16
PublicationDateYYYYMMDD 2025-06-16
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-06-16
  day: 16
PublicationDecade 2020
PublicationTitle Advances in artificial intelligence research : (Online)
PublicationYear 2025
References ref13
ref12
ref15
ref14
ref11
ref10
ref2
ref1
ref17
ref16
ref19
ref18
ref24
ref23
ref26
ref25
ref20
ref22
ref21
ref28
ref27
ref29
ref8
ref7
ref9
ref4
ref3
ref6
ref5
References_xml – ident: ref19
  doi: 10.2139/ssrn.4984843
– ident: ref8
  doi: 10.1109/SIU.2018.8404617
– ident: ref9
  doi: 10.3390/s24072034
– ident: ref5
  doi: 10.1007/978-3-319-39601-9_4
– ident: ref27
  doi: 10.1109/CVPR52729.2023.00271
– ident: ref25
– ident: ref15
  doi: 10.1016/j.neucom.2020.07.018
– ident: ref16
  doi: 10.1016/j.compbiomed.2022.105645
– ident: ref12
  doi: 10.1109/ICPRS58416.2023.10179037
– ident: ref3
  doi: 10.1016/j.inffus.2023.101859
– ident: ref11
  doi: 10.1007/978-3-030-68821-9_47
– ident: ref7
  doi: 10.1109/CVPRW63382.2024.00373
– ident: ref1
  doi: 10.3390/s23136137
– ident: ref13
– ident: ref23
  doi: 10.1109/CVPRW63382.2024.00439
– ident: ref21
  doi: 10.1145/3343031.3350948
– ident: ref14
  doi: 10.1109/CVPRW63382.2024.00379
– ident: ref18
  doi: 10.1007/978-3-319-10599-4_29
– ident: ref22
  doi: 10.1145/2964284.2964315
– ident: ref20
  doi: 10.1145/3627377.3627442
– ident: ref2
  doi: 10.5220/0012388200003660
– ident: ref4
  doi: 10.1109/WACV48630.2021.00175
– ident: ref28
– ident: ref29
  doi: 10.1145/3552485.3554939
– ident: ref26
– ident: ref24
  doi: 10.1109/CVPR52688.2022.01593
– ident: ref6
  doi: 10.1145/3391624
– ident: ref10
– ident: ref17
  doi: 10.1109/CVPRW63382.2024.00375
SSID ssj0002920930
Score 2.2975097
Snippet Generalization of food image recognition frameworks is difficult due to the wide variety of food categories in cuisines across cultures. The performance of the...
SourceID unpaywall
crossref
SourceType Open Access Repository
Index Database
StartPage 7
Title Context-aware CLIP for Enhanced Food Recognition
URI https://dergipark.org.tr/en/download/article-file/4908717
UnpaywallVersion publishedVersion
Volume 5
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2757-7422
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0002920930
  issn: 2757-7422
  databaseCode: M~E
  dateStart: 20210101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NS8NAEF1se_Bk_cSKlhxET5s0zSabHEtpKYJFpIXqpcx-aWlJSkgpevC3u5sPLR5EPO8clpld5j1meA-ha9YBxQnlGBiVmIAKMFAVYdWVLqFKUZav_N-Pg9GU3M382Y7Vl5Dpy2IN6dJM8e0sdWTsCCMbn4BwymRiI1nkmImV5iI11Ah8DcPrqDEdP_SejJkc9SnWlK9baGoajBA5AIvUdo22TW4p_92D9jfxGt62sFrtNJZhEz1XVyr2SZb2JmM2f_-h1vivOx-igxJuWr3i-AjtyfgYNSsrB6v82Seok6tUaRIMW0il1dd839Jw1hrEr_mKgDVMEmE9VttGSXyKJsPBpD_CpZkC5pRSzH1JCYRdEYZBKHUaJDFablxxAWFEQAoZeK4rmAg8qlmL1yG6XIK7HuMaAXjeGarHSSzPkUU1agOhmKcUI1y6EVUsikDzEqb7IeMtdFNld74uJDPmmmrkZZibMszLMrTQ7Vfuf4-8-HPkJapn6UZeaaSQsTaq3X8M2uXL-AQyOsOv
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NS8NAEF20PXiyfmJFZQ-ip02aZptNjqVYimARaaF6CbNfWlqSElKK_np386HFg4jnncMys8u8xwzvIXTNO6AFZYIAZ4pQ0AEBpiOiu8qjTGvGi5X_h3EwmtL7WW-2ZfUlVfY6X0G2sFN8J89clbjSysanIN0qmcRKFrl2YmW4yC5qBj0DwxuoOR0_9p-tmRzrMWIoX7fU1LQYIXIB5pnjWW2bwlL-uwftrZMVvG9gudxqLMMWeqmvVO6TLJx1zh3x8UOt8V93PkD7FdzE_fL4EO2o5Ai1aisHXP3sY9QpVKoMCYYNZAoPDN_HBs7iu-StWBHAwzSV-KneNkqTEzQZ3k0GI1KZKRDBGCOipxiFsCvDMAiVSYOiVstNaCEhjCgoqQLf8ySXgc8Ma_E71JRLCs_nwiAA3z9FjSRN1BnCzKA2kJr7WnMqlBcxzaMIDC_hph9y0UY3dXbjVSmZERuqUZQhtmWIqzK00e1X7n-PPP9z5AVq5NlaXRqkkPOr6k18AqCrwn4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Context-aware+CLIP+for+Enhanced+Food+Recognition&rft.jtitle=Advances+in+artificial+intelligence+research+%3A+%28Online%29&rft.au=%C3%96zt%C3%BCrk+Erg%C3%BCn%2C+%C3%96vg%C3%BC&rft.date=2025-06-16&rft.issn=2757-7422&rft.eissn=2757-7422&rft.volume=5&rft.issue=1&rft.spage=7&rft.epage=13&rft_id=info:doi/10.54569%2Faair.1707867&rft.externalDBID=n%2Fa&rft.externalDocID=10_54569_aair_1707867
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2757-7422&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2757-7422&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2757-7422&client=summon