Dynamic cache-updating contrastive language-image pre-training model for few-shot remote sensing image classification
Remote sensing images often contain sensitive information related to national security and military secrets, making it difficult to acquire sufficient data samples. Although contrastive language-image pre-training (CLIP) supports zero-shot recognition, its pre-training on general images limits its e...
Saved in:
| Published in | Journal of applied remote sensing Vol. 19; no. 2; p. 026502 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Society of Photo-Optical Instrumentation Engineers
01.04.2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1931-3195 1931-3195 |
| DOI | 10.1117/1.JRS.19.026502 |
Cover
| Abstract | Remote sensing images often contain sensitive information related to national security and military secrets, making it difficult to acquire sufficient data samples. Although contrastive language-image pre-training (CLIP) supports zero-shot recognition, its pre-training on general images limits its effectiveness for remote sensing classification. The Tip-Adapter method improves adaptation by introducing a knowledge cache, but its performance is limited when the cache size is small. To address these challenges, we propose a dynamic cache-updating CLIP model that enhances classification accuracy by iteratively selecting Top-K pseudo-labels from Tip-Adapter’s predictions and updating the knowledge cache. An adaptive weight adjustment module is also introduced to balance performance across categories, ensuring that classes with lower accuracy receive more focus, thus mitigating accuracy drops due to high inter-class similarity. We investigate the impact of different Top-K values and compare the use of soft versus hard labels for pseudo-labeling. Results show that hard labels provide clearer assignments and lead to better performance. Although increasing the Top-K value improves accuracy by expanding the knowledge cache, excessive Top-K values reduce label credibility, necessitating a balance between cache size and label quality. On the EuroSAT dataset, with Tip-Adapter as the base model, updating the top-2 hard labels for 15 iterations increased one-shot accuracy from 54.89% to 73.25% and two-shot accuracy from 59.41% to 74.78%. Similarly, with Tip-Adapter-F as the base model, one-shot accuracy improved from 65.06% to 79.12%, and two-shot accuracy from 65.11% to 80.78%. Similar improvements were observed on the UCMerced (UCM) and NWPU-RESISC45 datasets, further demonstrating the effectiveness and generalizability of the proposed method. Our code will be released at https://github.com/xu-c22/DCU-CLIP |
|---|---|
| AbstractList | Remote sensing images often contain sensitive information related to national security and military secrets, making it difficult to acquire sufficient data samples. Although contrastive language-image pre-training (CLIP) supports zero-shot recognition, its pre-training on general images limits its effectiveness for remote sensing classification. The Tip-Adapter method improves adaptation by introducing a knowledge cache, but its performance is limited when the cache size is small. To address these challenges, we propose a dynamic cache-updating CLIP model that enhances classification accuracy by iteratively selecting Top-K pseudo-labels from Tip-Adapter’s predictions and updating the knowledge cache. An adaptive weight adjustment module is also introduced to balance performance across categories, ensuring that classes with lower accuracy receive more focus, thus mitigating accuracy drops due to high inter-class similarity. We investigate the impact of different Top-K values and compare the use of soft versus hard labels for pseudo-labeling. Results show that hard labels provide clearer assignments and lead to better performance. Although increasing the Top-K value improves accuracy by expanding the knowledge cache, excessive Top-K values reduce label credibility, necessitating a balance between cache size and label quality. On the EuroSAT dataset, with Tip-Adapter as the base model, updating the top-2 hard labels for 15 iterations increased one-shot accuracy from 54.89% to 73.25% and two-shot accuracy from 59.41% to 74.78%. Similarly, with Tip-Adapter-F as the base model, one-shot accuracy improved from 65.06% to 79.12%, and two-shot accuracy from 65.11% to 80.78%. Similar improvements were observed on the UCMerced (UCM) and NWPU-RESISC45 datasets, further demonstrating the effectiveness and generalizability of the proposed method. Our code will be released at https://github.com/xu-c22/DCU-CLIP |
| Author | Xu, Cheng Ou, Zhengyu Han, Zandong |
| Author_xml | – sequence: 1 givenname: Cheng orcidid: 0009-0007-9450-3790 surname: Xu fullname: Xu, Cheng email: 472299060@qq.com organization: Training Base of Army Engineering University, Xuzhou, China – sequence: 2 givenname: Zhengyu surname: Ou fullname: Ou, Zhengyu email: ozy20@mails.tsinghua.edu.cn organization: Tsinghua University, Department of Mechanical Engineering, State Key Laboratory of Clean and Efficient Turbomachinery Power Equipment, Beijing, China – sequence: 3 givenname: Zandong orcidid: 0000-0002-6927-868X surname: Han fullname: Han, Zandong email: hanzd@tsinghua.edu.cn organization: Ministry of Education, Key Laboratory for Advanced Materials Processing Technology, Beijing, China |
| BookMark | eNp1kEtrwzAQhEVJoUnac6_6AZWjR2Rbx5K-CRT6OAtZlhIFWzKS3ZJ_Xwf30Esvuws7MwzfAsx88AaAa4IzQkixItnL23tGRIZpzjE9A3MiGEGMCD77c1-ARUoHjDkry2IOhrujV63TUCu9N2joatU7v4M6-D6q1LsvAxvld4PaGeTaccIuGjT-nD_p2lCbBtoQoTXfKO1DD6NpQ29gMj6dFJNJNyolZ50e44O_BOdWNclc_e4l-Hy4_9g8oe3r4_Pmdos0LViPuME61yVhdi1qJWieCyswr9YVLWvLrWCqKHOLSbkuOaVVxZg2DFeKFzktqGBLsJpydQwpRWNlF8c-8SgJlidqksiRmiRCTtRGx83kSJ0z8hCG6MeC_8p_AFyYcZ4 |
| Cites_doi | 10.1117/1.JRS.17.026509 10.3390/rs13132532 10.1109/IJCNN48605.2020.9207304 10.1109/TMM.2023.3311646 10.1016/j.jag.2023.103497 10.1109/TBDATA.2019.2921572 10.1109/CVPR52729.2023.01460 10.1109/TGRS.2023.3291357 10.1007/s11263-023-01891-x 10.1109/LGRS.2022.3180791 10.3390/rs16030525 10.1609/aaai.v35i8.16852 10.3390/rs15030666 10.1109/ICACTE55855.2022.9943594 10.1080/01431161.2024.2305632 10.1109/TGRS.2020.3033336 10.1109/TGRS.2023.3348464 10.1117/1.JRS.17.032405 10.1109/TGRS.2021.3099033 10.3390/rs13142728 10.1109/LGRS.2023.3282310 10.1109/LGRS.2022.3171257 10.1109/TIP.2024.3362062 10.1109/TGRS.2024.3385655 10.3390/rs14174254 10.1117/1.JRS.18.014525 10.1109/TGRS.2024.3386978 |
| ContentType | Journal Article |
| Copyright | 2025 Society of Photo-Optical Instrumentation Engineers (SPIE) |
| Copyright_xml | – notice: 2025 Society of Photo-Optical Instrumentation Engineers (SPIE) |
| DBID | AAYXX CITATION |
| DOI | 10.1117/1.JRS.19.026502 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Geography |
| EISSN | 1931-3195 |
| EndPage | 026502 |
| ExternalDocumentID | 10_1117_1_JRS_19_026502 |
| GroupedDBID | 0R~ 29J 5GY ABJNI ACGFO ACGFS ADMLS AENEX AKROS ALMA_UNASSIGNED_HOLDINGS CS3 DU5 EBS FQ0 HZ~ O9- RNS SPBNH AAYXX CITATION IAO M4X |
| ID | FETCH-LOGICAL-c273t-5e0c6c813f49da92669f905b4b28df5f93a786f01848522bb33ce30ba57627293 |
| ISSN | 1931-3195 |
| IngestDate | Wed Oct 01 08:29:57 EDT 2025 Thu Jul 03 03:12:18 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Keywords | few-shot image classification contrastive language-image pre-training semi-supervised pseudo label dynamic cache-updating iterate |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c273t-5e0c6c813f49da92669f905b4b28df5f93a786f01848522bb33ce30ba57627293 |
| ORCID | 0000-0002-6927-868X 0009-0007-9450-3790 |
| PageCount | 1 |
| ParticipantIDs | spie_journals_10_1117_1_JRS_19_026502 crossref_primary_10_1117_1_JRS_19_026502 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2025-04-01 |
| PublicationDateYYYYMMDD | 2025-04-01 |
| PublicationDate_xml | – month: 04 year: 2025 text: 2025-04-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of applied remote sensing |
| PublicationTitleAlternate | J. Appl. Remote Sens |
| PublicationYear | 2025 |
| Publisher | Society of Photo-Optical Instrumentation Engineers |
| Publisher_xml | – name: Society of Photo-Optical Instrumentation Engineers |
| References | r2 r4 r5 Zhang (r7) 2021 r8 r9 Khandelwal (r25) 2020 r30 r10 r32 r31 r12 r34 r11 r14 r13 r16 r38 r15 r37 r18 r17 r19 Zhang (r33) 2021 Orhan (r27) 2018 Snell (r36) 2017 Pantazis (r3) 2022 r21 Grave (r26) 2017 r20 r23 Merity (r24) 2016 r22 Guo (r6) 2022 r29 r28 Finn (r35) 2017 r1 |
| References_xml | – ident: r18 doi: 10.1117/1.JRS.17.026509 – ident: r11 doi: 10.3390/rs13132532 – ident: r30 doi: 10.1109/IJCNN48605.2020.9207304 – ident: r5 doi: 10.1109/TMM.2023.3311646 – ident: r34 doi: 10.1016/j.jag.2023.103497 – year: 2017 ident: r26 – ident: r28 doi: 10.1109/TBDATA.2019.2921572 – ident: r29 doi: 10.1109/CVPR52729.2023.01460 – ident: r38 doi: 10.1109/TGRS.2023.3291357 – year: 2018 ident: r27 – ident: r2 doi: 10.1007/s11263-023-01891-x – ident: r20 doi: 10.1109/LGRS.2022.3180791 – ident: r16 doi: 10.3390/rs16030525 – start-page: 4077 year: 2017 ident: r36 article-title: Prototypical networks for fewshot learning – ident: r32 doi: 10.1609/aaai.v35i8.16852 – ident: r8 doi: 10.3390/rs15030666 – ident: r14 doi: 10.1109/ICACTE55855.2022.9943594 – year: 2022 ident: r6 article-title: CALIP: zero-shot enhancement of CLIP with parameter-free attention – ident: r15 doi: 10.1080/01431161.2024.2305632 – year: 2016 ident: r24 – ident: r37 doi: 10.1109/TGRS.2020.3033336 – ident: r13 doi: 10.1109/TGRS.2023.3348464 – ident: r1 doi: 10.1117/1.JRS.17.032405 – ident: r23 doi: 10.1109/TGRS.2021.3099033 – ident: r10 doi: 10.3390/rs13142728 – ident: r12 doi: 10.1109/LGRS.2023.3282310 – year: 2021 ident: r33 article-title: FlexMatch: boosting semi-supervised learning with curriculum pseudo labeling – ident: r17 doi: 10.1109/LGRS.2022.3171257 – ident: r31 article-title: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks – year: 2021 ident: r7 article-title: Tip-Adapter: training-free CLIP-adapter for better vision-language modeling – ident: r4 doi: 10.1109/TIP.2024.3362062 – ident: r9 doi: 10.1109/TGRS.2024.3385655 – year: 2020 ident: r25 – ident: r22 doi: 10.3390/rs14174254 – ident: r19 doi: 10.1117/1.JRS.18.014525 – year: 2022 ident: r3 article-title: SVL-Adapter: self-supervised adapter for vision-language pretrained models – start-page: 1126 year: 2017 ident: r35 article-title: Model-agnostic meta-learning for fast adaptation of deep networks – ident: r21 doi: 10.1109/TGRS.2024.3386978 |
| SSID | ssj0053887 |
| Score | 2.3567135 |
| Snippet | Remote sensing images often contain sensitive information related to national security and military secrets, making it difficult to acquire sufficient data... |
| SourceID | crossref spie |
| SourceType | Index Database Publisher |
| StartPage | 026502 |
| Title | Dynamic cache-updating contrastive language-image pre-training model for few-shot remote sensing image classification |
| URI | http://www.dx.doi.org/10.1117/1.JRS.19.026502 |
| Volume | 19 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVEBS databaseName: Inspec with Full Text customDbUrl: eissn: 1931-3195 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0053887 issn: 1931-3195 databaseCode: ADMLS dateStart: 20070101 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text providerName: EBSCOhost |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLZK9wAviKsYN_kBJKTKIc49jxMDxsSYxDap2ksUO8ewh7UVaYTgf_H_OL6kcaCTBi9RdWSnUb6v9ufTcyHkhcqKpha5YjyBjCVNnjHBY2CciwYiAFmYugVHn7KDs-Rwns4nk19e1FK3FoH8uTWv5H9QRRviqrNk_wHZzU3RgJ8RX7wiwni9Fsb7tp38TOqqzKxb6VQFk0OrPbatCQrq_ZHs4lJH5-igj74rhO2CY-IMFXxn7dflevYNEDqYtTqsXSe7mElSK2wdUjSg-LecrZ2cHd-hx3Pe2f_2YTAdG9O5Nv3ohoXQrILntW4y8sV3SUSpF8liXJIu3lQH8eGjL9nxyvrlP5iiuJcuqWqxKbnouSVRT3LcF2zjzQC22PpVu_TYGXlLMB4q0zDyNvTBsGW7MAUHgsPPJwEvA3_qqAa3PSnlFa9wZMXLyo68QXYi3EzCKdnZ2z_6eNJLANxETGfGzYO7mlJ4i9d_fNlIDk3b1QV48ub0DrntgKR7lmR3yQQW98jN9-Aqmt8nnSMbHZONemSjY7JRn2zUkI0i2WhPNmqpQh1VqJ00JtsDcvbu7embA-Z6djCJQnjNUghlJgseq6Rs6hLlX6nKMBWJiIpGpaqM67zIVMiLpEDpL0QcS4hDUeO5N8KDXvyQTBfLBTwiVORpVkSNbKTIkyaGMo_xrKF0a7C0CUPYJa_6V1etbGmW6gqgdslL_Wor99ttrxr3-JrjnpBbA--fkimyGp6hOF2L544KvwFvlJJV |
| linkProvider | EBSCOhost |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Dynamic+cache-updating+contrastive+language-image+pre-training+model+for+few-shot+remote+sensing+image+classification&rft.jtitle=Journal+of+applied+remote+sensing&rft.au=Xu%2C+Cheng&rft.au=Ou%2C+Zhengyu&rft.au=Han%2C+Zandong&rft.date=2025-04-01&rft.pub=Society+of+Photo-Optical+Instrumentation+Engineers&rft.issn=1931-3195&rft.eissn=1931-3195&rft.volume=19&rft.issue=2&rft.spage=026502&rft.epage=026502&rft_id=info:doi/10.1117%2F1.JRS.19.026502&rft.externalDocID=10_1117_1_JRS_19_026502 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1931-3195&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1931-3195&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1931-3195&client=summon |