Context-aware CLIP for Enhanced Food Recognition

Generalization of food image recognition frameworks is difficult due to the wide variety of food categories in cuisines across cultures. The performance of the deep neural network models highly depends on the training dataset. To overcome this problem, we propose to extract context information from...

Full description

Saved in:

Bibliographic Details
Published in	Advances in artificial intelligence research : (Online) Vol. 5; no. 1; pp. 7 - 13
Main Author	Öztürk Ergün, Övgü
Format	Journal Article
Language	English
Published	16.06.2025
Online Access	Get full text
ISSN	2757-7422 2757-7422
DOI	10.54569/aair.1707867

Cover

Abstract	Generalization of food image recognition frameworks is difficult due to the wide variety of food categories in cuisines across cultures. The performance of the deep neural network models highly depends on the training dataset. To overcome this problem, we propose to extract context information from images in order to increase the discrimination capacity of networks. In this work, we utilize the CLIP architecture with the automatically derived ingredient context from food images. A list of ingredients are associated with each food category, which is later modeled as text after a voting process and fed to a CLIP architecture together with input image. Experimental results on the Food101 dataset show that this approach significantly improves the model’s performance, achieving a 2% overall increase in accuracy. This improvement varies across food classes, with increases ranging from 0.5% to as much as 22%. The proposed framework, CLIP fed with ingredient text, outperforms Yolov8 (81.46%) with 81.80% top 1 overall accuracy over 101 classes.
AbstractList	Generalization of food image recognition frameworks is difficult due to the wide variety of food categories in cuisines across cultures. The performance of the deep neural network models highly depends on the training dataset. To overcome this problem, we propose to extract context information from images in order to increase the discrimination capacity of networks. In this work, we utilize the CLIP architecture with the automatically derived ingredient context from food images. A list of ingredients are associated with each food category, which is later modeled as text after a voting process and fed to a CLIP architecture together with input image. Experimental results on the Food101 dataset show that this approach significantly improves the model’s performance, achieving a 2% overall increase in accuracy. This improvement varies across food classes, with increases ranging from 0.5% to as much as 22%. The proposed framework, CLIP fed with ingredient text, outperforms Yolov8 (81.46%) with 81.80% top 1 overall accuracy over 101 classes.
Author	Öztürk Ergün, Övgü
Author_xml	– sequence: 1 givenname: Övgü orcidid: 0009-0007-6273-4877 surname: Öztürk Ergün fullname: Öztürk Ergün, Övgü
BookMark	eNqFjz1PwzAYhC1UJErpyJ4_4ODP2BlR1EKlSFRV9-it_QaCgl05QaX_nkI7sDHdDXene27JJMSAhNxzlmuli_IBoEs5N8zYwlyRqTDaUKOEmPzxN2Q-DO-MMVEKVko2JayKYcSvkcIBEmZVvVpnbUzZIrxBcOizZYw-26CLr6EbuxjuyHUL_YDzi87IdrnYVs-0fnlaVY81dcYY6jQaBVZ4awuLp1OoBJfKtc6DLRWgx0Jy7ne-kEZrJplqBXrH5c4JpaSckfw8-xn2cDxA3zf71H1AOjacNb_EzQ9xcyE-Fei54FIchoTtP_lvoTlYww
Cites_doi	10.2139/ssrn.4984843 10.1109/SIU.2018.8404617 10.3390/s24072034 10.1007/978-3-319-39601-9_4 10.1109/CVPR52729.2023.00271 10.1016/j.neucom.2020.07.018 10.1016/j.compbiomed.2022.105645 10.1109/ICPRS58416.2023.10179037 10.1016/j.inffus.2023.101859 10.1007/978-3-030-68821-9_47 10.1109/CVPRW63382.2024.00373 10.3390/s23136137 10.1109/CVPRW63382.2024.00439 10.1145/3343031.3350948 10.1109/CVPRW63382.2024.00379 10.1007/978-3-319-10599-4_29 10.1145/2964284.2964315 10.1145/3627377.3627442 10.5220/0012388200003660 10.1109/WACV48630.2021.00175 10.1145/3552485.3554939 10.1109/CVPR52688.2022.01593 10.1145/3391624 10.1109/CVPRW63382.2024.00375
ContentType	Journal Article
DBID	AAYXX CITATION ADTOC UNPAY
DOI	10.54569/aair.1707867
DatabaseName	CrossRef Unpaywall for CDI: Periodical Content Unpaywall
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
Database_xml	– sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	2757-7422
EndPage	13
ExternalDocumentID	10.54569/aair.1707867 10_54569_aair_1707867
GroupedDBID	AAYXX ALMA_UNASSIGNED_HOLDINGS CITATION GROUPED_DOAJ M~E ADTOC UNPAY
ID	FETCH-LOGICAL-c777-c5e74a82d8868e707e42134cfcda894aede6311dbd637550304f2edc13bc24433
IEDL.DBID	UNPAY
ISSN	2757-7422
IngestDate	Tue Aug 19 23:34:12 EDT 2025 Wed Oct 01 05:59:14 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	1
Language	English
License	cc-by-nc
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c777-c5e74a82d8868e707e42134cfcda894aede6311dbd637550304f2edc13bc24433
ORCID	0009-0007-6273-4877
OpenAccessLink	https://proxy.k.utb.cz/login?url=https://dergipark.org.tr/en/download/article-file/4908717
PageCount	7
ParticipantIDs	unpaywall_primary_10_54569_aair_1707867 crossref_primary_10_54569_aair_1707867
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2025-06-16
PublicationDateYYYYMMDD	2025-06-16
PublicationDate_xml	– month: 06 year: 2025 text: 2025-06-16 day: 16
PublicationDecade	2020
PublicationTitle	Advances in artificial intelligence research : (Online)
PublicationYear	2025
References	ref13 ref12 ref15 ref14 ref11 ref10 ref2 ref1 ref17 ref16 ref19 ref18 ref24 ref23 ref26 ref25 ref20 ref22 ref21 ref28 ref27 ref29 ref8 ref7 ref9 ref4 ref3 ref6 ref5
References_xml	– ident: ref19 doi: 10.2139/ssrn.4984843 – ident: ref8 doi: 10.1109/SIU.2018.8404617 – ident: ref9 doi: 10.3390/s24072034 – ident: ref5 doi: 10.1007/978-3-319-39601-9_4 – ident: ref27 doi: 10.1109/CVPR52729.2023.00271 – ident: ref25 – ident: ref15 doi: 10.1016/j.neucom.2020.07.018 – ident: ref16 doi: 10.1016/j.compbiomed.2022.105645 – ident: ref12 doi: 10.1109/ICPRS58416.2023.10179037 – ident: ref3 doi: 10.1016/j.inffus.2023.101859 – ident: ref11 doi: 10.1007/978-3-030-68821-9_47 – ident: ref7 doi: 10.1109/CVPRW63382.2024.00373 – ident: ref1 doi: 10.3390/s23136137 – ident: ref13 – ident: ref23 doi: 10.1109/CVPRW63382.2024.00439 – ident: ref21 doi: 10.1145/3343031.3350948 – ident: ref14 doi: 10.1109/CVPRW63382.2024.00379 – ident: ref18 doi: 10.1007/978-3-319-10599-4_29 – ident: ref22 doi: 10.1145/2964284.2964315 – ident: ref20 doi: 10.1145/3627377.3627442 – ident: ref2 doi: 10.5220/0012388200003660 – ident: ref4 doi: 10.1109/WACV48630.2021.00175 – ident: ref28 – ident: ref29 doi: 10.1145/3552485.3554939 – ident: ref26 – ident: ref24 doi: 10.1109/CVPR52688.2022.01593 – ident: ref6 doi: 10.1145/3391624 – ident: ref10 – ident: ref17 doi: 10.1109/CVPRW63382.2024.00375
SSID	ssj0002920930
Score	2.2975097
Snippet	Generalization of food image recognition frameworks is difficult due to the wide variety of food categories in cuisines across cultures. The performance of the...
SourceID	unpaywall crossref
SourceType	Open Access Repository Index Database
StartPage	7
Title	Context-aware CLIP for Enhanced Food Recognition
URI	https://dergipark.org.tr/en/download/article-file/4908717
UnpaywallVersion	publishedVersion
Volume	5
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2757-7422 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002920930 issn: 2757-7422 databaseCode: M~E dateStart: 20210101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NS8NAEF1se_Bk_cSKlhxET5s0zSabHEtpKYJFpIXqpcx-aWlJSkgpevC3u5sPLR5EPO8clpld5j1meA-ha9YBxQnlGBiVmIAKMFAVYdWVLqFKUZav_N-Pg9GU3M382Y7Vl5Dpy2IN6dJM8e0sdWTsCCMbn4BwymRiI1nkmImV5iI11Ah8DcPrqDEdP_SejJkc9SnWlK9baGoajBA5AIvUdo22TW4p_92D9jfxGt62sFrtNJZhEz1XVyr2SZb2JmM2f_-h1vivOx-igxJuWr3i-AjtyfgYNSsrB6v82Seok6tUaRIMW0il1dd839Jw1hrEr_mKgDVMEmE9VttGSXyKJsPBpD_CpZkC5pRSzH1JCYRdEYZBKHUaJDFablxxAWFEQAoZeK4rmAg8qlmL1yG6XIK7HuMaAXjeGarHSSzPkUU1agOhmKcUI1y6EVUsikDzEqb7IeMtdFNld74uJDPmmmrkZZibMszLMrTQ7Vfuf4-8-HPkJapn6UZeaaSQsTaq3X8M2uXL-AQyOsOv
linkProvider	Unpaywall
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NS8NAEF20PXiyfmJFZQ-ip02aZptNjqVYimARaaF6CbNfWlqSElKK_np386HFg4jnncMys8u8xwzvIXTNO6AFZYIAZ4pQ0AEBpiOiu8qjTGvGi5X_h3EwmtL7WW-2ZfUlVfY6X0G2sFN8J89clbjSysanIN0qmcRKFrl2YmW4yC5qBj0DwxuoOR0_9p-tmRzrMWIoX7fU1LQYIXIB5pnjWW2bwlL-uwftrZMVvG9gudxqLMMWeqmvVO6TLJx1zh3x8UOt8V93PkD7FdzE_fL4EO2o5Ai1aisHXP3sY9QpVKoMCYYNZAoPDN_HBs7iu-StWBHAwzSV-KneNkqTEzQZ3k0GI1KZKRDBGCOipxiFsCvDMAiVSYOiVstNaCEhjCgoqQLf8ySXgc8Ma_E71JRLCs_nwiAA3z9FjSRN1BnCzKA2kJr7WnMqlBcxzaMIDC_hph9y0UY3dXbjVSmZERuqUZQhtmWIqzK00e1X7n-PPP9z5AVq5NlaXRqkkPOr6k18AqCrwn4
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Context-aware+CLIP+for+Enhanced+Food+Recognition&rft.jtitle=Advances+in+artificial+intelligence+research+%3A+%28Online%29&rft.au=%C3%96zt%C3%BCrk+Erg%C3%BCn%2C+%C3%96vg%C3%BC&rft.date=2025-06-16&rft.issn=2757-7422&rft.eissn=2757-7422&rft.volume=5&rft.issue=1&rft.spage=7&rft.epage=13&rft_id=info:doi/10.54569%2Faair.1707867&rft.externalDBID=n%2Fa&rft.externalDocID=10_54569_aair_1707867
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2757-7422&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2757-7422&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2757-7422&client=summon