Dynamic cache-updating contrastive language-image pre-training model for few-shot remote sensing image classification

Remote sensing images often contain sensitive information related to national security and military secrets, making it difficult to acquire sufficient data samples. Although contrastive language-image pre-training (CLIP) supports zero-shot recognition, its pre-training on general images limits its e...

Full description

Saved in:
Bibliographic Details
Published inJournal of applied remote sensing Vol. 19; no. 2; p. 026502
Main Authors Xu, Cheng, Ou, Zhengyu, Han, Zandong
Format Journal Article
LanguageEnglish
Published Society of Photo-Optical Instrumentation Engineers 01.04.2025
Subjects
Online AccessGet full text
ISSN1931-3195
1931-3195
DOI10.1117/1.JRS.19.026502

Cover

Abstract Remote sensing images often contain sensitive information related to national security and military secrets, making it difficult to acquire sufficient data samples. Although contrastive language-image pre-training (CLIP) supports zero-shot recognition, its pre-training on general images limits its effectiveness for remote sensing classification. The Tip-Adapter method improves adaptation by introducing a knowledge cache, but its performance is limited when the cache size is small. To address these challenges, we propose a dynamic cache-updating CLIP model that enhances classification accuracy by iteratively selecting Top-K pseudo-labels from Tip-Adapter’s predictions and updating the knowledge cache. An adaptive weight adjustment module is also introduced to balance performance across categories, ensuring that classes with lower accuracy receive more focus, thus mitigating accuracy drops due to high inter-class similarity. We investigate the impact of different Top-K values and compare the use of soft versus hard labels for pseudo-labeling. Results show that hard labels provide clearer assignments and lead to better performance. Although increasing the Top-K value improves accuracy by expanding the knowledge cache, excessive Top-K values reduce label credibility, necessitating a balance between cache size and label quality. On the EuroSAT dataset, with Tip-Adapter as the base model, updating the top-2 hard labels for 15 iterations increased one-shot accuracy from 54.89% to 73.25% and two-shot accuracy from 59.41% to 74.78%. Similarly, with Tip-Adapter-F as the base model, one-shot accuracy improved from 65.06% to 79.12%, and two-shot accuracy from 65.11% to 80.78%. Similar improvements were observed on the UCMerced (UCM) and NWPU-RESISC45 datasets, further demonstrating the effectiveness and generalizability of the proposed method. Our code will be released at https://github.com/xu-c22/DCU-CLIP
AbstractList Remote sensing images often contain sensitive information related to national security and military secrets, making it difficult to acquire sufficient data samples. Although contrastive language-image pre-training (CLIP) supports zero-shot recognition, its pre-training on general images limits its effectiveness for remote sensing classification. The Tip-Adapter method improves adaptation by introducing a knowledge cache, but its performance is limited when the cache size is small. To address these challenges, we propose a dynamic cache-updating CLIP model that enhances classification accuracy by iteratively selecting Top-K pseudo-labels from Tip-Adapter’s predictions and updating the knowledge cache. An adaptive weight adjustment module is also introduced to balance performance across categories, ensuring that classes with lower accuracy receive more focus, thus mitigating accuracy drops due to high inter-class similarity. We investigate the impact of different Top-K values and compare the use of soft versus hard labels for pseudo-labeling. Results show that hard labels provide clearer assignments and lead to better performance. Although increasing the Top-K value improves accuracy by expanding the knowledge cache, excessive Top-K values reduce label credibility, necessitating a balance between cache size and label quality. On the EuroSAT dataset, with Tip-Adapter as the base model, updating the top-2 hard labels for 15 iterations increased one-shot accuracy from 54.89% to 73.25% and two-shot accuracy from 59.41% to 74.78%. Similarly, with Tip-Adapter-F as the base model, one-shot accuracy improved from 65.06% to 79.12%, and two-shot accuracy from 65.11% to 80.78%. Similar improvements were observed on the UCMerced (UCM) and NWPU-RESISC45 datasets, further demonstrating the effectiveness and generalizability of the proposed method. Our code will be released at https://github.com/xu-c22/DCU-CLIP
Author Xu, Cheng
Ou, Zhengyu
Han, Zandong
Author_xml – sequence: 1
  givenname: Cheng
  orcidid: 0009-0007-9450-3790
  surname: Xu
  fullname: Xu, Cheng
  email: 472299060@qq.com
  organization: Training Base of Army Engineering University, Xuzhou, China
– sequence: 2
  givenname: Zhengyu
  surname: Ou
  fullname: Ou, Zhengyu
  email: ozy20@mails.tsinghua.edu.cn
  organization: Tsinghua University, Department of Mechanical Engineering, State Key Laboratory of Clean and Efficient Turbomachinery Power Equipment, Beijing, China
– sequence: 3
  givenname: Zandong
  orcidid: 0000-0002-6927-868X
  surname: Han
  fullname: Han, Zandong
  email: hanzd@tsinghua.edu.cn
  organization: Ministry of Education, Key Laboratory for Advanced Materials Processing Technology, Beijing, China
BookMark eNp1kEtrwzAQhEVJoUnac6_6AZWjR2Rbx5K-CRT6OAtZlhIFWzKS3ZJ_Xwf30Esvuws7MwzfAsx88AaAa4IzQkixItnL23tGRIZpzjE9A3MiGEGMCD77c1-ARUoHjDkry2IOhrujV63TUCu9N2joatU7v4M6-D6q1LsvAxvld4PaGeTaccIuGjT-nD_p2lCbBtoQoTXfKO1DD6NpQ29gMj6dFJNJNyolZ50e44O_BOdWNclc_e4l-Hy4_9g8oe3r4_Pmdos0LViPuME61yVhdi1qJWieCyswr9YVLWvLrWCqKHOLSbkuOaVVxZg2DFeKFzktqGBLsJpydQwpRWNlF8c-8SgJlidqksiRmiRCTtRGx83kSJ0z8hCG6MeC_8p_AFyYcZ4
Cites_doi 10.1117/1.JRS.17.026509
10.3390/rs13132532
10.1109/IJCNN48605.2020.9207304
10.1109/TMM.2023.3311646
10.1016/j.jag.2023.103497
10.1109/TBDATA.2019.2921572
10.1109/CVPR52729.2023.01460
10.1109/TGRS.2023.3291357
10.1007/s11263-023-01891-x
10.1109/LGRS.2022.3180791
10.3390/rs16030525
10.1609/aaai.v35i8.16852
10.3390/rs15030666
10.1109/ICACTE55855.2022.9943594
10.1080/01431161.2024.2305632
10.1109/TGRS.2020.3033336
10.1109/TGRS.2023.3348464
10.1117/1.JRS.17.032405
10.1109/TGRS.2021.3099033
10.3390/rs13142728
10.1109/LGRS.2023.3282310
10.1109/LGRS.2022.3171257
10.1109/TIP.2024.3362062
10.1109/TGRS.2024.3385655
10.3390/rs14174254
10.1117/1.JRS.18.014525
10.1109/TGRS.2024.3386978
ContentType Journal Article
Copyright 2025 Society of Photo-Optical Instrumentation Engineers (SPIE)
Copyright_xml – notice: 2025 Society of Photo-Optical Instrumentation Engineers (SPIE)
DBID AAYXX
CITATION
DOI 10.1117/1.JRS.19.026502
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Geography
EISSN 1931-3195
EndPage 026502
ExternalDocumentID 10_1117_1_JRS_19_026502
GroupedDBID 0R~
29J
5GY
ABJNI
ACGFO
ACGFS
ADMLS
AENEX
AKROS
ALMA_UNASSIGNED_HOLDINGS
CS3
DU5
EBS
FQ0
HZ~
O9-
RNS
SPBNH
AAYXX
CITATION
IAO
M4X
ID FETCH-LOGICAL-c273t-5e0c6c813f49da92669f905b4b28df5f93a786f01848522bb33ce30ba57627293
ISSN 1931-3195
IngestDate Wed Oct 01 08:29:57 EDT 2025
Thu Jul 03 03:12:18 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 2
Keywords few-shot image classification
contrastive language-image pre-training
semi-supervised
pseudo label
dynamic cache-updating
iterate
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c273t-5e0c6c813f49da92669f905b4b28df5f93a786f01848522bb33ce30ba57627293
ORCID 0000-0002-6927-868X
0009-0007-9450-3790
PageCount 1
ParticipantIDs spie_journals_10_1117_1_JRS_19_026502
crossref_primary_10_1117_1_JRS_19_026502
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2025-04-01
PublicationDateYYYYMMDD 2025-04-01
PublicationDate_xml – month: 04
  year: 2025
  text: 2025-04-01
  day: 01
PublicationDecade 2020
PublicationTitle Journal of applied remote sensing
PublicationTitleAlternate J. Appl. Remote Sens
PublicationYear 2025
Publisher Society of Photo-Optical Instrumentation Engineers
Publisher_xml – name: Society of Photo-Optical Instrumentation Engineers
References r2
r4
r5
Zhang (r7) 2021
r8
r9
Khandelwal (r25) 2020
r30
r10
r32
r31
r12
r34
r11
r14
r13
r16
r38
r15
r37
r18
r17
r19
Zhang (r33) 2021
Orhan (r27) 2018
Snell (r36) 2017
Pantazis (r3) 2022
r21
Grave (r26) 2017
r20
r23
Merity (r24) 2016
r22
Guo (r6) 2022
r29
r28
Finn (r35) 2017
r1
References_xml – ident: r18
  doi: 10.1117/1.JRS.17.026509
– ident: r11
  doi: 10.3390/rs13132532
– ident: r30
  doi: 10.1109/IJCNN48605.2020.9207304
– ident: r5
  doi: 10.1109/TMM.2023.3311646
– ident: r34
  doi: 10.1016/j.jag.2023.103497
– year: 2017
  ident: r26
– ident: r28
  doi: 10.1109/TBDATA.2019.2921572
– ident: r29
  doi: 10.1109/CVPR52729.2023.01460
– ident: r38
  doi: 10.1109/TGRS.2023.3291357
– year: 2018
  ident: r27
– ident: r2
  doi: 10.1007/s11263-023-01891-x
– ident: r20
  doi: 10.1109/LGRS.2022.3180791
– ident: r16
  doi: 10.3390/rs16030525
– start-page: 4077
  year: 2017
  ident: r36
  article-title: Prototypical networks for fewshot learning
– ident: r32
  doi: 10.1609/aaai.v35i8.16852
– ident: r8
  doi: 10.3390/rs15030666
– ident: r14
  doi: 10.1109/ICACTE55855.2022.9943594
– year: 2022
  ident: r6
  article-title: CALIP: zero-shot enhancement of CLIP with parameter-free attention
– ident: r15
  doi: 10.1080/01431161.2024.2305632
– year: 2016
  ident: r24
– ident: r37
  doi: 10.1109/TGRS.2020.3033336
– ident: r13
  doi: 10.1109/TGRS.2023.3348464
– ident: r1
  doi: 10.1117/1.JRS.17.032405
– ident: r23
  doi: 10.1109/TGRS.2021.3099033
– ident: r10
  doi: 10.3390/rs13142728
– ident: r12
  doi: 10.1109/LGRS.2023.3282310
– year: 2021
  ident: r33
  article-title: FlexMatch: boosting semi-supervised learning with curriculum pseudo labeling
– ident: r17
  doi: 10.1109/LGRS.2022.3171257
– ident: r31
  article-title: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks
– year: 2021
  ident: r7
  article-title: Tip-Adapter: training-free CLIP-adapter for better vision-language modeling
– ident: r4
  doi: 10.1109/TIP.2024.3362062
– ident: r9
  doi: 10.1109/TGRS.2024.3385655
– year: 2020
  ident: r25
– ident: r22
  doi: 10.3390/rs14174254
– ident: r19
  doi: 10.1117/1.JRS.18.014525
– year: 2022
  ident: r3
  article-title: SVL-Adapter: self-supervised adapter for vision-language pretrained models
– start-page: 1126
  year: 2017
  ident: r35
  article-title: Model-agnostic meta-learning for fast adaptation of deep networks
– ident: r21
  doi: 10.1109/TGRS.2024.3386978
SSID ssj0053887
Score 2.3567135
Snippet Remote sensing images often contain sensitive information related to national security and military secrets, making it difficult to acquire sufficient data...
SourceID crossref
spie
SourceType Index Database
Publisher
StartPage 026502
Title Dynamic cache-updating contrastive language-image pre-training model for few-shot remote sensing image classification
URI http://www.dx.doi.org/10.1117/1.JRS.19.026502
Volume 19
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: Inspec with Full Text
  customDbUrl:
  eissn: 1931-3195
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053887
  issn: 1931-3195
  databaseCode: ADMLS
  dateStart: 20070101
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text
  providerName: EBSCOhost
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLZK9wAviKsYN_kBJKTKIc49jxMDxsSYxDap2ksUO8ewh7UVaYTgf_H_OL6kcaCTBi9RdWSnUb6v9ufTcyHkhcqKpha5YjyBjCVNnjHBY2CciwYiAFmYugVHn7KDs-Rwns4nk19e1FK3FoH8uTWv5H9QRRviqrNk_wHZzU3RgJ8RX7wiwni9Fsb7tp38TOqqzKxb6VQFk0OrPbatCQrq_ZHs4lJH5-igj74rhO2CY-IMFXxn7dflevYNEDqYtTqsXSe7mElSK2wdUjSg-LecrZ2cHd-hx3Pe2f_2YTAdG9O5Nv3ohoXQrILntW4y8sV3SUSpF8liXJIu3lQH8eGjL9nxyvrlP5iiuJcuqWqxKbnouSVRT3LcF2zjzQC22PpVu_TYGXlLMB4q0zDyNvTBsGW7MAUHgsPPJwEvA3_qqAa3PSnlFa9wZMXLyo68QXYi3EzCKdnZ2z_6eNJLANxETGfGzYO7mlJ4i9d_fNlIDk3b1QV48ub0DrntgKR7lmR3yQQW98jN9-Aqmt8nnSMbHZONemSjY7JRn2zUkI0i2WhPNmqpQh1VqJ00JtsDcvbu7embA-Z6djCJQnjNUghlJgseq6Rs6hLlX6nKMBWJiIpGpaqM67zIVMiLpEDpL0QcS4hDUeO5N8KDXvyQTBfLBTwiVORpVkSNbKTIkyaGMo_xrKF0a7C0CUPYJa_6V1etbGmW6gqgdslL_Wor99ttrxr3-JrjnpBbA--fkimyGp6hOF2L544KvwFvlJJV
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Dynamic+cache-updating+contrastive+language-image+pre-training+model+for+few-shot+remote+sensing+image+classification&rft.jtitle=Journal+of+applied+remote+sensing&rft.au=Xu%2C+Cheng&rft.au=Ou%2C+Zhengyu&rft.au=Han%2C+Zandong&rft.date=2025-04-01&rft.pub=Society+of+Photo-Optical+Instrumentation+Engineers&rft.issn=1931-3195&rft.eissn=1931-3195&rft.volume=19&rft.issue=2&rft.spage=026502&rft.epage=026502&rft_id=info:doi/10.1117%2F1.JRS.19.026502&rft.externalDocID=10_1117_1_JRS_19_026502
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1931-3195&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1931-3195&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1931-3195&client=summon