Context-aware CLIP for Enhanced Food Recognition
Generalization of food image recognition frameworks is difficult due to the wide variety of food categories in cuisines across cultures. The performance of the deep neural network models highly depends on the training dataset. To overcome this problem, we propose to extract context information from...
Saved in:
| Published in | Advances in artificial intelligence research : (Online) Vol. 5; no. 1; pp. 7 - 13 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | English |
| Published |
16.06.2025
|
| Online Access | Get full text |
| ISSN | 2757-7422 2757-7422 |
| DOI | 10.54569/aair.1707867 |
Cover
| Abstract | Generalization of food image recognition frameworks is difficult due to the wide variety of food categories in cuisines across cultures. The performance of the deep neural network models highly depends on the training dataset. To overcome this problem, we propose to extract context information from images in order to increase the discrimination capacity of networks. In this work, we utilize the CLIP architecture with the automatically derived ingredient context from food images. A list of ingredients are associated with each food category, which is later modeled as text after a voting process and fed to a CLIP architecture together with input image. Experimental results on the Food101 dataset show that this approach significantly improves the model’s performance, achieving a 2% overall increase in accuracy. This improvement varies across food classes, with increases ranging from 0.5% to as much as 22%. The proposed framework, CLIP fed with ingredient text, outperforms Yolov8 (81.46%) with 81.80% top 1 overall accuracy over 101 classes. |
|---|---|
| AbstractList | Generalization of food image recognition frameworks is difficult due to the wide variety of food categories in cuisines across cultures. The performance of the deep neural network models highly depends on the training dataset. To overcome this problem, we propose to extract context information from images in order to increase the discrimination capacity of networks. In this work, we utilize the CLIP architecture with the automatically derived ingredient context from food images. A list of ingredients are associated with each food category, which is later modeled as text after a voting process and fed to a CLIP architecture together with input image. Experimental results on the Food101 dataset show that this approach significantly improves the model’s performance, achieving a 2% overall increase in accuracy. This improvement varies across food classes, with increases ranging from 0.5% to as much as 22%. The proposed framework, CLIP fed with ingredient text, outperforms Yolov8 (81.46%) with 81.80% top 1 overall accuracy over 101 classes. |
| Author | Öztürk Ergün, Övgü |
| Author_xml | – sequence: 1 givenname: Övgü orcidid: 0009-0007-6273-4877 surname: Öztürk Ergün fullname: Öztürk Ergün, Övgü |
| BookMark | eNqFjz1PwzAYhC1UJErpyJ4_4ODP2BlR1EKlSFRV9-it_QaCgl05QaX_nkI7sDHdDXene27JJMSAhNxzlmuli_IBoEs5N8zYwlyRqTDaUKOEmPzxN2Q-DO-MMVEKVko2JayKYcSvkcIBEmZVvVpnbUzZIrxBcOizZYw-26CLr6EbuxjuyHUL_YDzi87IdrnYVs-0fnlaVY81dcYY6jQaBVZ4awuLp1OoBJfKtc6DLRWgx0Jy7ne-kEZrJplqBXrH5c4JpaSckfw8-xn2cDxA3zf71H1AOjacNb_EzQ9xcyE-Fei54FIchoTtP_lvoTlYww |
| Cites_doi | 10.2139/ssrn.4984843 10.1109/SIU.2018.8404617 10.3390/s24072034 10.1007/978-3-319-39601-9_4 10.1109/CVPR52729.2023.00271 10.1016/j.neucom.2020.07.018 10.1016/j.compbiomed.2022.105645 10.1109/ICPRS58416.2023.10179037 10.1016/j.inffus.2023.101859 10.1007/978-3-030-68821-9_47 10.1109/CVPRW63382.2024.00373 10.3390/s23136137 10.1109/CVPRW63382.2024.00439 10.1145/3343031.3350948 10.1109/CVPRW63382.2024.00379 10.1007/978-3-319-10599-4_29 10.1145/2964284.2964315 10.1145/3627377.3627442 10.5220/0012388200003660 10.1109/WACV48630.2021.00175 10.1145/3552485.3554939 10.1109/CVPR52688.2022.01593 10.1145/3391624 10.1109/CVPRW63382.2024.00375 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION ADTOC UNPAY |
| DOI | 10.54569/aair.1707867 |
| DatabaseName | CrossRef Unpaywall for CDI: Periodical Content Unpaywall |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2757-7422 |
| EndPage | 13 |
| ExternalDocumentID | 10.54569/aair.1707867 10_54569_aair_1707867 |
| GroupedDBID | AAYXX ALMA_UNASSIGNED_HOLDINGS CITATION GROUPED_DOAJ M~E ADTOC UNPAY |
| ID | FETCH-LOGICAL-c777-c5e74a82d8868e707e42134cfcda894aede6311dbd637550304f2edc13bc24433 |
| IEDL.DBID | UNPAY |
| ISSN | 2757-7422 |
| IngestDate | Tue Aug 19 23:34:12 EDT 2025 Wed Oct 01 05:59:14 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| License | cc-by-nc |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c777-c5e74a82d8868e707e42134cfcda894aede6311dbd637550304f2edc13bc24433 |
| ORCID | 0009-0007-6273-4877 |
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://dergipark.org.tr/en/download/article-file/4908717 |
| PageCount | 7 |
| ParticipantIDs | unpaywall_primary_10_54569_aair_1707867 crossref_primary_10_54569_aair_1707867 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2025-06-16 |
| PublicationDateYYYYMMDD | 2025-06-16 |
| PublicationDate_xml | – month: 06 year: 2025 text: 2025-06-16 day: 16 |
| PublicationDecade | 2020 |
| PublicationTitle | Advances in artificial intelligence research : (Online) |
| PublicationYear | 2025 |
| References | ref13 ref12 ref15 ref14 ref11 ref10 ref2 ref1 ref17 ref16 ref19 ref18 ref24 ref23 ref26 ref25 ref20 ref22 ref21 ref28 ref27 ref29 ref8 ref7 ref9 ref4 ref3 ref6 ref5 |
| References_xml | – ident: ref19 doi: 10.2139/ssrn.4984843 – ident: ref8 doi: 10.1109/SIU.2018.8404617 – ident: ref9 doi: 10.3390/s24072034 – ident: ref5 doi: 10.1007/978-3-319-39601-9_4 – ident: ref27 doi: 10.1109/CVPR52729.2023.00271 – ident: ref25 – ident: ref15 doi: 10.1016/j.neucom.2020.07.018 – ident: ref16 doi: 10.1016/j.compbiomed.2022.105645 – ident: ref12 doi: 10.1109/ICPRS58416.2023.10179037 – ident: ref3 doi: 10.1016/j.inffus.2023.101859 – ident: ref11 doi: 10.1007/978-3-030-68821-9_47 – ident: ref7 doi: 10.1109/CVPRW63382.2024.00373 – ident: ref1 doi: 10.3390/s23136137 – ident: ref13 – ident: ref23 doi: 10.1109/CVPRW63382.2024.00439 – ident: ref21 doi: 10.1145/3343031.3350948 – ident: ref14 doi: 10.1109/CVPRW63382.2024.00379 – ident: ref18 doi: 10.1007/978-3-319-10599-4_29 – ident: ref22 doi: 10.1145/2964284.2964315 – ident: ref20 doi: 10.1145/3627377.3627442 – ident: ref2 doi: 10.5220/0012388200003660 – ident: ref4 doi: 10.1109/WACV48630.2021.00175 – ident: ref28 – ident: ref29 doi: 10.1145/3552485.3554939 – ident: ref26 – ident: ref24 doi: 10.1109/CVPR52688.2022.01593 – ident: ref6 doi: 10.1145/3391624 – ident: ref10 – ident: ref17 doi: 10.1109/CVPRW63382.2024.00375 |
| SSID | ssj0002920930 |
| Score | 2.2975097 |
| Snippet | Generalization of food image recognition frameworks is difficult due to the wide variety of food categories in cuisines across cultures. The performance of the... |
| SourceID | unpaywall crossref |
| SourceType | Open Access Repository Index Database |
| StartPage | 7 |
| Title | Context-aware CLIP for Enhanced Food Recognition |
| URI | https://dergipark.org.tr/en/download/article-file/4908717 |
| UnpaywallVersion | publishedVersion |
| Volume | 5 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2757-7422 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002920930 issn: 2757-7422 databaseCode: M~E dateStart: 20210101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NS8NAEF1se_Bk_cSKlhxET5s0zSabHEtpKYJFpIXqpcx-aWlJSkgpevC3u5sPLR5EPO8clpld5j1meA-ha9YBxQnlGBiVmIAKMFAVYdWVLqFKUZav_N-Pg9GU3M382Y7Vl5Dpy2IN6dJM8e0sdWTsCCMbn4BwymRiI1nkmImV5iI11Ah8DcPrqDEdP_SejJkc9SnWlK9baGoajBA5AIvUdo22TW4p_92D9jfxGt62sFrtNJZhEz1XVyr2SZb2JmM2f_-h1vivOx-igxJuWr3i-AjtyfgYNSsrB6v82Seok6tUaRIMW0il1dd839Jw1hrEr_mKgDVMEmE9VttGSXyKJsPBpD_CpZkC5pRSzH1JCYRdEYZBKHUaJDFablxxAWFEQAoZeK4rmAg8qlmL1yG6XIK7HuMaAXjeGarHSSzPkUU1agOhmKcUI1y6EVUsikDzEqb7IeMtdFNld74uJDPmmmrkZZibMszLMrTQ7Vfuf4-8-HPkJapn6UZeaaSQsTaq3X8M2uXL-AQyOsOv |
| linkProvider | Unpaywall |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NS8NAEF20PXiyfmJFZQ-ip02aZptNjqVYimARaaF6CbNfWlqSElKK_np386HFg4jnncMys8u8xwzvIXTNO6AFZYIAZ4pQ0AEBpiOiu8qjTGvGi5X_h3EwmtL7WW-2ZfUlVfY6X0G2sFN8J89clbjSysanIN0qmcRKFrl2YmW4yC5qBj0DwxuoOR0_9p-tmRzrMWIoX7fU1LQYIXIB5pnjWW2bwlL-uwftrZMVvG9gudxqLMMWeqmvVO6TLJx1zh3x8UOt8V93PkD7FdzE_fL4EO2o5Ai1aisHXP3sY9QpVKoMCYYNZAoPDN_HBs7iu-StWBHAwzSV-KneNkqTEzQZ3k0GI1KZKRDBGCOipxiFsCvDMAiVSYOiVstNaCEhjCgoqQLf8ySXgc8Ma_E71JRLCs_nwiAA3z9FjSRN1BnCzKA2kJr7WnMqlBcxzaMIDC_hph9y0UY3dXbjVSmZERuqUZQhtmWIqzK00e1X7n-PPP9z5AVq5NlaXRqkkPOr6k18AqCrwn4 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Context-aware+CLIP+for+Enhanced+Food+Recognition&rft.jtitle=Advances+in+artificial+intelligence+research+%3A+%28Online%29&rft.au=%C3%96zt%C3%BCrk+Erg%C3%BCn%2C+%C3%96vg%C3%BC&rft.date=2025-06-16&rft.issn=2757-7422&rft.eissn=2757-7422&rft.volume=5&rft.issue=1&rft.spage=7&rft.epage=13&rft_id=info:doi/10.54569%2Faair.1707867&rft.externalDBID=n%2Fa&rft.externalDocID=10_54569_aair_1707867 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2757-7422&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2757-7422&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2757-7422&client=summon |