Dynamic cache-updating contrastive language-image pre-training model for few-shot remote sensing image classification

Remote sensing images often contain sensitive information related to national security and military secrets, making it difficult to acquire sufficient data samples. Although contrastive language-image pre-training (CLIP) supports zero-shot recognition, its pre-training on general images limits its e...

Full description

Saved in:

Bibliographic Details
Published in	Journal of applied remote sensing Vol. 19; no. 2; p. 026502
Main Authors	Xu, Cheng, Ou, Zhengyu, Han, Zandong
Format	Journal Article
Language	English
Published	Society of Photo-Optical Instrumentation Engineers 01.04.2025
Subjects	few-shot image classification contrastive language-image pre-training semi-supervised pseudo label dynamic cache-updating iterate
Online Access	Get full text
ISSN	1931-3195 1931-3195
DOI	10.1117/1.JRS.19.026502

Cover

Abstract	Remote sensing images often contain sensitive information related to national security and military secrets, making it difficult to acquire sufficient data samples. Although contrastive language-image pre-training (CLIP) supports zero-shot recognition, its pre-training on general images limits its effectiveness for remote sensing classification. The Tip-Adapter method improves adaptation by introducing a knowledge cache, but its performance is limited when the cache size is small. To address these challenges, we propose a dynamic cache-updating CLIP model that enhances classification accuracy by iteratively selecting Top-K pseudo-labels from Tip-Adapter’s predictions and updating the knowledge cache. An adaptive weight adjustment module is also introduced to balance performance across categories, ensuring that classes with lower accuracy receive more focus, thus mitigating accuracy drops due to high inter-class similarity. We investigate the impact of different Top-K values and compare the use of soft versus hard labels for pseudo-labeling. Results show that hard labels provide clearer assignments and lead to better performance. Although increasing the Top-K value improves accuracy by expanding the knowledge cache, excessive Top-K values reduce label credibility, necessitating a balance between cache size and label quality. On the EuroSAT dataset, with Tip-Adapter as the base model, updating the top-2 hard labels for 15 iterations increased one-shot accuracy from 54.89% to 73.25% and two-shot accuracy from 59.41% to 74.78%. Similarly, with Tip-Adapter-F as the base model, one-shot accuracy improved from 65.06% to 79.12%, and two-shot accuracy from 65.11% to 80.78%. Similar improvements were observed on the UCMerced (UCM) and NWPU-RESISC45 datasets, further demonstrating the effectiveness and generalizability of the proposed method. Our code will be released at https://github.com/xu-c22/DCU-CLIP
AbstractList	Remote sensing images often contain sensitive information related to national security and military secrets, making it difficult to acquire sufficient data samples. Although contrastive language-image pre-training (CLIP) supports zero-shot recognition, its pre-training on general images limits its effectiveness for remote sensing classification. The Tip-Adapter method improves adaptation by introducing a knowledge cache, but its performance is limited when the cache size is small. To address these challenges, we propose a dynamic cache-updating CLIP model that enhances classification accuracy by iteratively selecting Top-K pseudo-labels from Tip-Adapter’s predictions and updating the knowledge cache. An adaptive weight adjustment module is also introduced to balance performance across categories, ensuring that classes with lower accuracy receive more focus, thus mitigating accuracy drops due to high inter-class similarity. We investigate the impact of different Top-K values and compare the use of soft versus hard labels for pseudo-labeling. Results show that hard labels provide clearer assignments and lead to better performance. Although increasing the Top-K value improves accuracy by expanding the knowledge cache, excessive Top-K values reduce label credibility, necessitating a balance between cache size and label quality. On the EuroSAT dataset, with Tip-Adapter as the base model, updating the top-2 hard labels for 15 iterations increased one-shot accuracy from 54.89% to 73.25% and two-shot accuracy from 59.41% to 74.78%. Similarly, with Tip-Adapter-F as the base model, one-shot accuracy improved from 65.06% to 79.12%, and two-shot accuracy from 65.11% to 80.78%. Similar improvements were observed on the UCMerced (UCM) and NWPU-RESISC45 datasets, further demonstrating the effectiveness and generalizability of the proposed method. Our code will be released at https://github.com/xu-c22/DCU-CLIP
Author	Xu, Cheng Ou, Zhengyu Han, Zandong
Author_xml	– sequence: 1 givenname: Cheng orcidid: 0009-0007-9450-3790 surname: Xu fullname: Xu, Cheng email: 472299060@qq.com organization: Training Base of Army Engineering University, Xuzhou, China – sequence: 2 givenname: Zhengyu surname: Ou fullname: Ou, Zhengyu email: ozy20@mails.tsinghua.edu.cn organization: Tsinghua University, Department of Mechanical Engineering, State Key Laboratory of Clean and Efficient Turbomachinery Power Equipment, Beijing, China – sequence: 3 givenname: Zandong orcidid: 0000-0002-6927-868X surname: Han fullname: Han, Zandong email: hanzd@tsinghua.edu.cn organization: Ministry of Education, Key Laboratory for Advanced Materials Processing Technology, Beijing, China
BookMark	eNp1kEtrwzAQhEVJoUnac6_6AZWjR2Rbx5K-CRT6OAtZlhIFWzKS3ZJ_Xwf30Esvuws7MwzfAsx88AaAa4IzQkixItnL23tGRIZpzjE9A3MiGEGMCD77c1-ARUoHjDkry2IOhrujV63TUCu9N2joatU7v4M6-D6q1LsvAxvld4PaGeTaccIuGjT-nD_p2lCbBtoQoTXfKO1DD6NpQ29gMj6dFJNJNyolZ50e44O_BOdWNclc_e4l-Hy4_9g8oe3r4_Pmdos0LViPuME61yVhdi1qJWieCyswr9YVLWvLrWCqKHOLSbkuOaVVxZg2DFeKFzktqGBLsJpydQwpRWNlF8c-8SgJlidqksiRmiRCTtRGx83kSJ0z8hCG6MeC_8p_AFyYcZ4
Cites_doi	10.1117/1.JRS.17.026509 10.3390/rs13132532 10.1109/IJCNN48605.2020.9207304 10.1109/TMM.2023.3311646 10.1016/j.jag.2023.103497 10.1109/TBDATA.2019.2921572 10.1109/CVPR52729.2023.01460 10.1109/TGRS.2023.3291357 10.1007/s11263-023-01891-x 10.1109/LGRS.2022.3180791 10.3390/rs16030525 10.1609/aaai.v35i8.16852 10.3390/rs15030666 10.1109/ICACTE55855.2022.9943594 10.1080/01431161.2024.2305632 10.1109/TGRS.2020.3033336 10.1109/TGRS.2023.3348464 10.1117/1.JRS.17.032405 10.1109/TGRS.2021.3099033 10.3390/rs13142728 10.1109/LGRS.2023.3282310 10.1109/LGRS.2022.3171257 10.1109/TIP.2024.3362062 10.1109/TGRS.2024.3385655 10.3390/rs14174254 10.1117/1.JRS.18.014525 10.1109/TGRS.2024.3386978
ContentType	Journal Article
Copyright	2025 Society of Photo-Optical Instrumentation Engineers (SPIE)
Copyright_xml	– notice: 2025 Society of Photo-Optical Instrumentation Engineers (SPIE)
DBID	AAYXX CITATION
DOI	10.1117/1.JRS.19.026502
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Geography
EISSN	1931-3195
EndPage	026502
ExternalDocumentID	10_1117_1_JRS_19_026502
GroupedDBID	0R~ 29J 5GY ABJNI ACGFO ACGFS ADMLS AENEX AKROS ALMA_UNASSIGNED_HOLDINGS CS3 DU5 EBS FQ0 HZ~ O9- RNS SPBNH AAYXX CITATION IAO M4X
ID	FETCH-LOGICAL-c273t-5e0c6c813f49da92669f905b4b28df5f93a786f01848522bb33ce30ba57627293
ISSN	1931-3195
IngestDate	Wed Oct 01 08:29:57 EDT 2025 Thu Jul 03 03:12:18 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	2
Keywords	few-shot image classification contrastive language-image pre-training semi-supervised pseudo label dynamic cache-updating iterate
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c273t-5e0c6c813f49da92669f905b4b28df5f93a786f01848522bb33ce30ba57627293
ORCID	0000-0002-6927-868X 0009-0007-9450-3790
PageCount	1
ParticipantIDs	spie_journals_10_1117_1_JRS_19_026502 crossref_primary_10_1117_1_JRS_19_026502
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2025-04-01
PublicationDateYYYYMMDD	2025-04-01
PublicationDate_xml	– month: 04 year: 2025 text: 2025-04-01 day: 01
PublicationDecade	2020
PublicationTitle	Journal of applied remote sensing
PublicationTitleAlternate	J. Appl. Remote Sens
PublicationYear	2025
Publisher	Society of Photo-Optical Instrumentation Engineers
Publisher_xml	– name: Society of Photo-Optical Instrumentation Engineers
References	r2 r4 r5 Zhang (r7) 2021 r8 r9 Khandelwal (r25) 2020 r30 r10 r32 r31 r12 r34 r11 r14 r13 r16 r38 r15 r37 r18 r17 r19 Zhang (r33) 2021 Orhan (r27) 2018 Snell (r36) 2017 Pantazis (r3) 2022 r21 Grave (r26) 2017 r20 r23 Merity (r24) 2016 r22 Guo (r6) 2022 r29 r28 Finn (r35) 2017 r1
References_xml	– ident: r18 doi: 10.1117/1.JRS.17.026509 – ident: r11 doi: 10.3390/rs13132532 – ident: r30 doi: 10.1109/IJCNN48605.2020.9207304 – ident: r5 doi: 10.1109/TMM.2023.3311646 – ident: r34 doi: 10.1016/j.jag.2023.103497 – year: 2017 ident: r26 – ident: r28 doi: 10.1109/TBDATA.2019.2921572 – ident: r29 doi: 10.1109/CVPR52729.2023.01460 – ident: r38 doi: 10.1109/TGRS.2023.3291357 – year: 2018 ident: r27 – ident: r2 doi: 10.1007/s11263-023-01891-x – ident: r20 doi: 10.1109/LGRS.2022.3180791 – ident: r16 doi: 10.3390/rs16030525 – start-page: 4077 year: 2017 ident: r36 article-title: Prototypical networks for fewshot learning – ident: r32 doi: 10.1609/aaai.v35i8.16852 – ident: r8 doi: 10.3390/rs15030666 – ident: r14 doi: 10.1109/ICACTE55855.2022.9943594 – year: 2022 ident: r6 article-title: CALIP: zero-shot enhancement of CLIP with parameter-free attention – ident: r15 doi: 10.1080/01431161.2024.2305632 – year: 2016 ident: r24 – ident: r37 doi: 10.1109/TGRS.2020.3033336 – ident: r13 doi: 10.1109/TGRS.2023.3348464 – ident: r1 doi: 10.1117/1.JRS.17.032405 – ident: r23 doi: 10.1109/TGRS.2021.3099033 – ident: r10 doi: 10.3390/rs13142728 – ident: r12 doi: 10.1109/LGRS.2023.3282310 – year: 2021 ident: r33 article-title: FlexMatch: boosting semi-supervised learning with curriculum pseudo labeling – ident: r17 doi: 10.1109/LGRS.2022.3171257 – ident: r31 article-title: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks – year: 2021 ident: r7 article-title: Tip-Adapter: training-free CLIP-adapter for better vision-language modeling – ident: r4 doi: 10.1109/TIP.2024.3362062 – ident: r9 doi: 10.1109/TGRS.2024.3385655 – year: 2020 ident: r25 – ident: r22 doi: 10.3390/rs14174254 – ident: r19 doi: 10.1117/1.JRS.18.014525 – year: 2022 ident: r3 article-title: SVL-Adapter: self-supervised adapter for vision-language pretrained models – start-page: 1126 year: 2017 ident: r35 article-title: Model-agnostic meta-learning for fast adaptation of deep networks – ident: r21 doi: 10.1109/TGRS.2024.3386978
SSID	ssj0053887
Score	2.3567135
Snippet	Remote sensing images often contain sensitive information related to national security and military secrets, making it difficult to acquire sufficient data...
SourceID	crossref spie
SourceType	Index Database Publisher
StartPage	026502
Title	Dynamic cache-updating contrastive language-image pre-training model for few-shot remote sensing image classification
URI	http://www.dx.doi.org/10.1117/1.JRS.19.026502
Volume	19
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVEBS databaseName: Inspec with Full Text customDbUrl: eissn: 1931-3195 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0053887 issn: 1931-3195 databaseCode: ADMLS dateStart: 20070101 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text providerName: EBSCOhost
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLZK9wAviKsYN_kBJKTKIc49jxMDxsSYxDap2ksUO8ewh7UVaYTgf_H_OL6kcaCTBi9RdWSnUb6v9ufTcyHkhcqKpha5YjyBjCVNnjHBY2CciwYiAFmYugVHn7KDs-Rwns4nk19e1FK3FoH8uTWv5H9QRRviqrNk_wHZzU3RgJ8RX7wiwni9Fsb7tp38TOqqzKxb6VQFk0OrPbatCQrq_ZHs4lJH5-igj74rhO2CY-IMFXxn7dflevYNEDqYtTqsXSe7mElSK2wdUjSg-LecrZ2cHd-hx3Pe2f_2YTAdG9O5Nv3ohoXQrILntW4y8sV3SUSpF8liXJIu3lQH8eGjL9nxyvrlP5iiuJcuqWqxKbnouSVRT3LcF2zjzQC22PpVu_TYGXlLMB4q0zDyNvTBsGW7MAUHgsPPJwEvA3_qqAa3PSnlFa9wZMXLyo68QXYi3EzCKdnZ2z_6eNJLANxETGfGzYO7mlJ4i9d_fNlIDk3b1QV48ub0DrntgKR7lmR3yQQW98jN9-Aqmt8nnSMbHZONemSjY7JRn2zUkI0i2WhPNmqpQh1VqJ00JtsDcvbu7embA-Z6djCJQnjNUghlJgseq6Rs6hLlX6nKMBWJiIpGpaqM67zIVMiLpEDpL0QcS4hDUeO5N8KDXvyQTBfLBTwiVORpVkSNbKTIkyaGMo_xrKF0a7C0CUPYJa_6V1etbGmW6gqgdslL_Wor99ttrxr3-JrjnpBbA--fkimyGp6hOF2L544KvwFvlJJV
linkProvider	EBSCOhost
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Dynamic+cache-updating+contrastive+language-image+pre-training+model+for+few-shot+remote+sensing+image+classification&rft.jtitle=Journal+of+applied+remote+sensing&rft.au=Xu%2C+Cheng&rft.au=Ou%2C+Zhengyu&rft.au=Han%2C+Zandong&rft.date=2025-04-01&rft.pub=Society+of+Photo-Optical+Instrumentation+Engineers&rft.issn=1931-3195&rft.eissn=1931-3195&rft.volume=19&rft.issue=2&rft.spage=026502&rft.epage=026502&rft_id=info:doi/10.1117%2F1.JRS.19.026502&rft.externalDocID=10_1117_1_JRS_19_026502
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1931-3195&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1931-3195&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1931-3195&client=summon