Multi-level fine-tuning, data augmentation, and few-shot learning for specialized cyber threat intelligence

Gathering cyber threat intelligence from open sources is becoming increasingly important for maintaining and achieving a high level of security as systems become larger and more complex. However, these open sources are often subject to information overload. It is therefore useful to apply machine le...

Full description

Saved in:
Bibliographic Details
Published inComputers & security Vol. 134; p. 103430
Main Authors Bayer, Markus, Frey, Tobias, Reuter, Christian
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.11.2023
Subjects
Online AccessGet full text
ISSN0167-4048
1872-6208
DOI10.1016/j.cose.2023.103430

Cover

Abstract Gathering cyber threat intelligence from open sources is becoming increasingly important for maintaining and achieving a high level of security as systems become larger and more complex. However, these open sources are often subject to information overload. It is therefore useful to apply machine learning models that condense the amount of information to what is necessary. Yet, previous studies and applications have shown that existing classifiers are not able to process information about emerging cybersecurity events, such as new malware names or novel attack contexts, due to their low generalisation capability. Therefore, we propose a system to overcome this problem by training a new classifier for each new incident. Since this requires a lot of labelled data using standard training methods, we combine three different low-data regime techniques – transfer learning, data augmentation, and few-shot learning – to train a high-quality classifier from very few labelled instances. We evaluated our approach using a novel dataset derived from the Microsoft Exchange Server data breach of 2021 which was labelled by three experts. Our findings reveal an increase in F1 score of more than 21 points compared to standard training methods and more than 18 points compared to a state-of-the-art method in few-shot learning. Furthermore, the classifier trained with this method and 32 instances is only less than 5 F1 score points worse than a classifier trained with 1800 instances.
AbstractList Gathering cyber threat intelligence from open sources is becoming increasingly important for maintaining and achieving a high level of security as systems become larger and more complex. However, these open sources are often subject to information overload. It is therefore useful to apply machine learning models that condense the amount of information to what is necessary. Yet, previous studies and applications have shown that existing classifiers are not able to process information about emerging cybersecurity events, such as new malware names or novel attack contexts, due to their low generalisation capability. Therefore, we propose a system to overcome this problem by training a new classifier for each new incident. Since this requires a lot of labelled data using standard training methods, we combine three different low-data regime techniques – transfer learning, data augmentation, and few-shot learning – to train a high-quality classifier from very few labelled instances. We evaluated our approach using a novel dataset derived from the Microsoft Exchange Server data breach of 2021 which was labelled by three experts. Our findings reveal an increase in F1 score of more than 21 points compared to standard training methods and more than 18 points compared to a state-of-the-art method in few-shot learning. Furthermore, the classifier trained with this method and 32 instances is only less than 5 F1 score points worse than a classifier trained with 1800 instances.
ArticleNumber 103430
Author Frey, Tobias
Bayer, Markus
Reuter, Christian
Author_xml – sequence: 1
  givenname: Markus
  orcidid: 0000-0002-2040-5609
  surname: Bayer
  fullname: Bayer, Markus
  email: bayer@peasec.tu-darmstadt.de
– sequence: 2
  givenname: Tobias
  surname: Frey
  fullname: Frey, Tobias
  email: tobiasjonathan.frey@stud.tu-darmstadt.de
– sequence: 3
  givenname: Christian
  orcidid: 0000-0003-1920-038X
  surname: Reuter
  fullname: Reuter, Christian
  email: reuter@peasec.tu-darmstadt.de
BookMark eNp9kMtOwzAQRS1UJNrCD7DyBzTFcZzEldigipdUxAbWlmOPWxfXqWy3qHw9icKKRVcjzdwz0j0TNPKtB4RuczLPSV7dbeeqjTCnhBbdomAFuUDjnNc0qyjhIzTuQnXGCONXaBLjlpC8rjgfo6-3g0s2c3AEh431kKWDt349w1omieVhvQOfZLKtn2HpNTbwncVNm7ADGfokNm3AcQ_KSmd_QGN1aiDgtAkgE7Y-gXN2DV7BNbo00kW4-ZtT9Pn0-LF8yVbvz6_Lh1WmCkJSVpZSNY1m3BRVLfOSMGmkaRijeUVqXlda84bq7qgp14xxxUrOTLWoKF0UDSmmiA9_VWhjDGCEskOFFKR1Iieilya2opcmemlikNah9B-6D3Ynw-k8dD9A0JU6WggiKtsX1jaASkK39hz-CydLiUE
CitedBy_id crossref_primary_10_1109_ACCESS_2025_3531659
crossref_primary_10_1016_j_energy_2024_131467
crossref_primary_10_1515_icom_2024_0002
crossref_primary_10_1080_01402390_2024_2447306
crossref_primary_10_1080_23311975_2024_2395430
crossref_primary_10_1145_3675392
crossref_primary_10_2196_51433
crossref_primary_10_1016_j_cose_2024_104278
crossref_primary_10_1109_ACCESS_2024_3448247
crossref_primary_10_1016_j_cose_2024_104016
Cites_doi 10.3390/app10175922
10.1016/j.future.2023.02.012
10.1145/3479865
10.1016/j.cose.2019.101589
10.1016/j.cose.2017.09.001
10.5771/0175-274X-2020-1-22
10.1016/j.ress.2019.106664
ContentType Journal Article
Copyright 2023 Elsevier Ltd
Copyright_xml – notice: 2023 Elsevier Ltd
DBID AAYXX
CITATION
DOI 10.1016/j.cose.2023.103430
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1872-6208
ExternalDocumentID 10_1016_j_cose_2023_103430
S0167404823003401
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
1B1
1RT
1~.
1~5
29F
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABFSI
ABMAC
ABXDB
ABYKQ
ACDAQ
ACGFO
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADHUB
ADJOM
ADMUD
AEBSH
AEKER
AENEX
AFFNX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BKOJK
BKOMP
BLXMC
CS3
DU5
E.L
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
GBLVA
GBOLZ
HLX
HLZ
HVGLF
HZ~
IHE
J1W
KOM
LG8
LG9
M41
MO0
MS~
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
R2-
RIG
RNS
ROL
RPZ
RXW
SBC
SBM
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SSV
SSZ
T5K
TAE
TN5
TWZ
WH7
WUQ
XJE
XPP
XSW
YK3
ZMT
~G-
AATTM
AAXKI
AAYWO
AAYXX
ABJNI
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
ID FETCH-LOGICAL-c300t-55acbbd48f367a1504afafb4421607876dd8b2d367d28d448c4584f6962293b03
IEDL.DBID .~1
ISSN 0167-4048
IngestDate Thu Apr 24 23:12:06 EDT 2025
Thu Oct 02 04:36:52 EDT 2025
Fri Feb 23 02:35:10 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Data augmentation
Information overload
Cyber threat intelligence
Few-shot learning
Transfer learning
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c300t-55acbbd48f367a1504afafb4421607876dd8b2d367d28d448c4584f6962293b03
ORCID 0000-0002-2040-5609
0000-0003-1920-038X
ParticipantIDs crossref_citationtrail_10_1016_j_cose_2023_103430
crossref_primary_10_1016_j_cose_2023_103430
elsevier_sciencedirect_doi_10_1016_j_cose_2023_103430
PublicationCentury 2000
PublicationDate November 2023
2023-11-00
PublicationDateYYYYMMDD 2023-11-01
PublicationDate_xml – month: 11
  year: 2023
  text: November 2023
PublicationDecade 2020
PublicationTitle Computers & security
PublicationYear 2023
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Mittal, Das, Mulwad, Joshi, Finin (br0300) 2016
Bayer, Kaufhold, Buchhold, Keller, Dallmeyer, Reuter (br0040) 2021
Beltagy, Lo, Cohan (br0070) 2019
Sabottke, Suciu, Dumitras (br0390) 2015
Taylor (br0440) 1953; 30
Anaby-Tavor, Carmeli, Goldbraich, Kantor, Kour, Shlomov, Tepper, Zwerdling (br0030) 2020
Sun, Xia, Yin, Liang, Yu, He (br0420) 2020
Reimers, Gurevych (br0350) 2019
Martin, Muller, Ortiz Suárez, Dupont, Romary, de la Clergerie, Seddah, Sagot (br0280) 2020
Longpre, Wang, DuBois (br0260) 2020
Lan, Chen, Goodman, Gimpel, Sharma, Soricut (br0220) 2020
Kuehn, Riebe, Apelt, Jansen, Reuter (br0210) 2020; 38
Yoo, Park, Kang, Lee, Park (br0490) 2021
Chatterjee, Thekdi (br0120) 2020; 193
Pan (br0330) 2020; 21
Abu, Selamat, Ariffin, Yusof (br0010) 2018; 10
Alves, Andongabo, Gashi, Ferreira, Bessani (br0020) 2020
Mosolova, Fomin, Bondarenko (br0310) 2018; 2268
Rodriguez, Okamura (br0380) 2019; vol. 2
Sennrich, Haddow, Birch (br0410) 2016
Devlin, Chang, Lee, Toutanova (br0130) 2018
Gao, Fisch, Chen (br0170) 2021
Kaufhold, Basyurt, Eyilmez, Ag, Stöttinger, Reuter, Sercan (br0200) 2022
Jiang, He, Chen, Liu, Gao, Zhao (br0190) 2020
Caballero, Gomez, Matic, Sánchez, Sebastián, Villacañas (br0110) 2023; 144
Fabbri, Han, Li, Li, Ghazvininejad, Joty, Radev, Mehdad (br0150) 2021
Tounsi, Rais (br0460) 2018; 72
Belinkov, Bisk (br0060) 2018
Liu, Ott, Goyal, Du, Joshi, Chen, Levy, Lewis, Zettlemoyer, Stoyanov, Allen (br0250) 2019
Torrey, Shavlik (br0450) 2010
Schick, Schütze (br0400) 2021
Le Sceller, Karbab, Debbabi, Iqbal (br0230) 2017
Bayer, Kaufhold, Reuter (br0050) 2022
Wagner, Mahbub, Palomar, Abdallah (br0470) 2019; 87
Bragg, Cohan, Lo, Beltagy (br0090) 2021
Mahabadi, Zettlemoyer, Henderson, Saeidi, Mathias, Stoyanov, Yazdani (br0270) 2022
Brown, Mann, Ryder, Subbiah, Kaplan, Dhariwal, Neelakantan, Shyam, Sastry, Askell, Agarwal, Herbert-Voss, Krueger, Henighan, Child, Ramesh, Ziegler, Wu, Winter, Hesse, Chen, Sigler, Litwin, Gray, Chess, Clark, Berner, McCandlish, Radford, Sutskever, Amodei (br0100) 2020
Niakanlahiji, Safarnejad, Harper, Chu (br0320) 2019
Tam, Menon, Bansal, Srivastava, Raffel (br0430) 2021
Zhang, Li, Chen, Deng, Bi, Tan, Huang, Chen (br0500) 2022
Wei, Zou (br0480) 2019
Fang, Gao, Liu, Huang (br0160) 2020; 10
Husari, Al-Shaer, Ahmed, Chu, Niu (br0180) 2017
Riebe, Wirth, Bayer, Kühn, Kaufhold, Knauthe, Guthe, Reuter (br0370) 2021
Lee, Yoon, Kim, Kim, Kim, So, Kang (br0240) 2019
Riebe, Kaufhold, Reuter (br0360) 2021; 5
McMillan (br0290) 2013
Queiroz Abonizio, Barbon (br0340) 2020; vol. 12319 LNAI
Dionísio, Alves, Ferreira, Bessani (br0140) 2020
Black, Biderman, Hallahan, Anthony, Gao, Golding, He, Leahy, McDonell, Phang, Pieler, Prashanth, Purohit, Reynolds, Tow, Wang, Weinbach (br0080) 2022
Dionísio (10.1016/j.cose.2023.103430_br0140) 2020
Gao (10.1016/j.cose.2023.103430_br0170)
Belinkov (10.1016/j.cose.2023.103430_br0060) 2018
Riebe (10.1016/j.cose.2023.103430_br0360) 2021; 5
Black (10.1016/j.cose.2023.103430_br0080)
Lan (10.1016/j.cose.2023.103430_br0220)
Kaufhold (10.1016/j.cose.2023.103430_br0200) 2022
Mittal (10.1016/j.cose.2023.103430_br0300) 2016
Mosolova (10.1016/j.cose.2023.103430_br0310) 2018; 2268
Sabottke (10.1016/j.cose.2023.103430_br0390) 2015
Lee (10.1016/j.cose.2023.103430_br0240) 2019
Mahabadi (10.1016/j.cose.2023.103430_br0270)
Bragg (10.1016/j.cose.2023.103430_br0090)
McMillan (10.1016/j.cose.2023.103430_br0290)
Anaby-Tavor (10.1016/j.cose.2023.103430_br0030) 2020
Fabbri (10.1016/j.cose.2023.103430_br0150)
Jiang (10.1016/j.cose.2023.103430_br0190) 2020
Tam (10.1016/j.cose.2023.103430_br0430)
Tounsi (10.1016/j.cose.2023.103430_br0460) 2018; 72
Beltagy (10.1016/j.cose.2023.103430_br0070)
Longpre (10.1016/j.cose.2023.103430_br0260) 2020
Alves (10.1016/j.cose.2023.103430_br0020) 2020
Bayer (10.1016/j.cose.2023.103430_br0040) 2021
Torrey (10.1016/j.cose.2023.103430_br0450) 2010
Reimers (10.1016/j.cose.2023.103430_br0350)
Zhang (10.1016/j.cose.2023.103430_br0500)
Chatterjee (10.1016/j.cose.2023.103430_br0120) 2020; 193
Le Sceller (10.1016/j.cose.2023.103430_br0230) 2017
Sennrich (10.1016/j.cose.2023.103430_br0410) 2016
Husari (10.1016/j.cose.2023.103430_br0180) 2017
Taylor (10.1016/j.cose.2023.103430_br0440) 1953; 30
Abu (10.1016/j.cose.2023.103430_br0010) 2018; 10
Caballero (10.1016/j.cose.2023.103430_br0110) 2023; 144
Liu (10.1016/j.cose.2023.103430_br0250) 2019
Devlin (10.1016/j.cose.2023.103430_br0130)
Riebe (10.1016/j.cose.2023.103430_br0370) 2021
Sun (10.1016/j.cose.2023.103430_br0420) 2020
Yoo (10.1016/j.cose.2023.103430_br0490) 2021
Brown (10.1016/j.cose.2023.103430_br0100) 2020
Niakanlahiji (10.1016/j.cose.2023.103430_br0320) 2019
Kuehn (10.1016/j.cose.2023.103430_br0210) 2020; 38
Rodriguez (10.1016/j.cose.2023.103430_br0380) 2019; vol. 2
Wagner (10.1016/j.cose.2023.103430_br0470) 2019; 87
Wei (10.1016/j.cose.2023.103430_br0480) 2019
Queiroz Abonizio (10.1016/j.cose.2023.103430_br0340) 2020; vol. 12319 LNAI
Pan (10.1016/j.cose.2023.103430_br0330) 2020; 21
Martin (10.1016/j.cose.2023.103430_br0280) 2020
Fang (10.1016/j.cose.2023.103430_br0160) 2020; 10
Schick (10.1016/j.cose.2023.103430_br0400)
Bayer (10.1016/j.cose.2023.103430_br0050) 2022
References_xml – volume: 2268
  start-page: 104
  year: 2018
  end-page: 109
  ident: br0310
  article-title: Text augmentation for neural networks
  publication-title: CEUR Workshop Proc.
– start-page: 2177
  year: 2020
  end-page: 2190
  ident: br0190
  article-title: SMART: robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization
  publication-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
– year: 2013
  ident: br0290
  article-title: Definition: threat intelligence
– start-page: 1041
  year: 2015
  end-page: 1056
  ident: br0390
  article-title: Vulnerability disclosure in the age of social media: exploiting Twitter for predicting real-world exploits
  publication-title: 24th USENIX Security Symposium (USENIX Security 15)
– year: 2018
  ident: br0130
  article-title: BERT: pre-training of deep bidirectional transformers for language understanding (Mlm)
– volume: 193
  year: 2020
  ident: br0120
  article-title: An iterative learning and inference approach to managing dynamic cyber vulnerabilities of complex systems
  publication-title: Reliab. Eng. Syst. Saf.
– year: 2017
  ident: br0230
  article-title: Sonar: automatic detection of cyber security events over the Twitter stream
  publication-title: Proceedings of the 12th International Conference on Availability, Reliability and Security
– year: 2022
  ident: br0050
  article-title: A survey on data augmentation for text classification
  publication-title: ACM Comput. Surv.
– year: 2021
  ident: br0090
  article-title: FLEX: unifying evaluation for few-shot NLP
– start-page: 860
  year: 2016
  end-page: 867
  ident: br0300
  article-title: Cybertwitter: using Twitter to generate alerts for cybersecurity threats and vulnerabilities
  publication-title: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)
– year: 2016
  ident: br0410
  article-title: Improving neural machine translation models with monolingual data
  publication-title: 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers
– volume: 30
  start-page: 415
  year: 1953
  end-page: 433
  ident: br0440
  article-title: “Cloze procedure”: a new tool for measuring readability
  publication-title: Journal. Quart.
– start-page: 217
  year: 2020
  end-page: 236
  ident: br0020
  article-title: Follow the blue bird: a study on threat data published on Twitter
  publication-title: Computer Security – ESORICS 2020
– year: 2022
  ident: br0500
  article-title: Differentiable prompt makes pre-trained language models better few-shot learners
– year: 2019
  ident: br0240
  article-title: BioBERT: a pre-trained biomedical language representation model for biomedical text mining
  publication-title: Bioinformatics
– volume: 10
  start-page: 371
  year: 2018
  end-page: 379
  ident: br0010
  article-title: Cyber threat intelligence–issue and challenges
  publication-title: Indones. J. Electr. Eng. Comput. Sci.
– year: 2020
  ident: br0100
  article-title: Language models are few-shot learners
  publication-title: NeurIPS
– volume: 72
  start-page: 212
  year: 2018
  end-page: 233
  ident: br0460
  article-title: A survey on technical threat intelligence in the age of sophisticated cyber attacks
  publication-title: Comput. Secur.
– year: 2022
  ident: br0270
  article-title: PERFECT: prompt-free and efficient few-shot learning with language models
– start-page: 103
  year: 2017
  end-page: 115
  ident: br0180
  article-title: TTPDrill: automatic and accurate extraction of threat actions from unstructured text of CTI sources
  publication-title: Proceedings of the 33rd Annual Computer Security Applications Conference
– year: 2022
  ident: br0200
  article-title: Cyber threat observatory: design and evaluation of an interactive dashboard for computer emergency response teams
  publication-title: ECIS 2022
– year: 2020
  ident: br0260
  article-title: How effective is task-agnostic data augmentation for pretrained transformers?
  publication-title: Findings of EMNLP
– year: 2020
  ident: br0420
  article-title: Mixup-transfomer: dynamic data augmentation for NLP tasks
– year: 2019
  ident: br0480
  article-title: EDA: easy data augmentation techniques for boosting performance on text classification tasks
  publication-title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
– year: 2022
  ident: br0080
  article-title: GPT-NeoX-20B: an open-source autoregressive language model
– year: 2018
  ident: br0060
  article-title: Synthetic and natural noise both break neural machine translation
  publication-title: Proceedings of ICLR
– start-page: 4747
  year: 2019
  end-page: 4754
  ident: br0320
  article-title: IoCMiner: automatic extraction of indicators of compromise from Twitter
  publication-title: 2019 IEEE International Conference on Big Data (Big Data)
– start-page: 2225
  year: 2021
  end-page: 2239
  ident: br0490
  article-title: GPT3Mix: leveraging large-scale language models for text augmentation
  publication-title: Findings of the Association for Computational Linguistics: EMNLP 2021
– year: 2021
  ident: br0170
  article-title: Making pre-trained language models better few-shot learners
– year: 2021
  ident: br0430
  article-title: Improving and simplifying pattern exploiting training
– volume: 10
  start-page: 5922
  year: 2020
  ident: br0160
  article-title: Detecting cyber threat event from Twitter using IDCNN and BiLSTM
  publication-title: Appl. Sci.
– start-page: 429
  year: 2021
  end-page: 446
  ident: br0370
  article-title: CySecAlert: an alert generation system for cyber security events using open source intelligence data
  publication-title: Information and Communications Security
– year: 2019
  ident: br0250
  article-title: RoBERTa: a Robustly Optimized BERT Pretraining Approach
– volume: 144
  start-page: 74
  year: 2023
  end-page: 89
  ident: br0110
  article-title: The rise of GoodFATR: a novel accuracy comparison methodology for indicator extraction tools
  publication-title: Future Gener. Comput. Syst.
– volume: 87
  year: 2019
  ident: br0470
  article-title: Cyber threat intelligence sharing: survey and research directions
  publication-title: Comput. Secur.
– year: 2019
  ident: br0070
  article-title: SciBERT: a pretrained language model for scientific text
– volume: 38
  start-page: 22
  year: 2020
  end-page: 28
  ident: br0210
  article-title: Sharing of cyber threat intelligence between states
  publication-title: Sicherh. Frieden
– start-page: 1
  year: 2020
  end-page: 8
  ident: br0140
  article-title: Towards end-to-end cyberthreat detection from Twitter using multi-task learning
  publication-title: 2020 International Joint Conference on Neural Networks (IJCNN)
– year: 2021
  ident: br0400
  article-title: Exploiting cloze questions for few shot text classification and natural language inference
– year: 2021
  ident: br0150
  article-title: Improving zero and few-shot abstractive summarization with intermediate fine-tuning and data augmentation
– year: 2020
  ident: br0030
  article-title: Do not have enough data? Deep learning to the rescue!
  publication-title: Proceedings of the AAAI
– volume: vol. 12319 LNAI
  start-page: 551
  year: 2020
  end-page: 565
  ident: br0340
  article-title: Pre-trained data augmentation for text classification
  publication-title: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
– start-page: 7203
  year: 2020
  end-page: 7219
  ident: br0280
  article-title: CamemBERT: a tasty French language model
  publication-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
– year: 2019
  ident: br0350
  article-title: Sentence-BERT: sentence embeddings using Siamese BERT-networks
– volume: 5
  start-page: 1
  year: 2021
  end-page: 30
  ident: br0360
  article-title: The impact of organizational structure and technology use on collaborative practices in computer emergency response teams: an empirical study
  publication-title: Proc. ACM Hum.-Comput. Interact.
– start-page: 242
  year: 2010
  end-page: 264
  ident: br0450
  article-title: Transfer learning
  publication-title: Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques
– year: 2021
  ident: br0040
  article-title: Data augmentation in natural language processing: a novel text generation approach for long and short text classifiers
  publication-title: Int. J. Mach. Learn. Cybern.
– volume: 21
  start-page: 1
  year: 2020
  end-page: 2
  ident: br0330
  article-title: Transfer learning
  publication-title: Learn.
– year: 2020
  ident: br0220
  article-title: ALBERT: a lite BERT for self-supervised learning of language representations
– volume: vol. 2
  start-page: 502
  year: 2019
  end-page: 507
  ident: br0380
  article-title: Generating real time cyber situational awareness information through social media data mining
  publication-title: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC)
– year: 2022
  ident: 10.1016/j.cose.2023.103430_br0200
  article-title: Cyber threat observatory: design and evaluation of an interactive dashboard for computer emergency response teams
– volume: 10
  start-page: 5922
  issue: 17
  year: 2020
  ident: 10.1016/j.cose.2023.103430_br0160
  article-title: Detecting cyber threat event from Twitter using IDCNN and BiLSTM
  publication-title: Appl. Sci.
  doi: 10.3390/app10175922
– start-page: 242
  year: 2010
  ident: 10.1016/j.cose.2023.103430_br0450
  article-title: Transfer learning
– year: 2020
  ident: 10.1016/j.cose.2023.103430_br0030
  article-title: Do not have enough data? Deep learning to the rescue!
– start-page: 860
  year: 2016
  ident: 10.1016/j.cose.2023.103430_br0300
  article-title: Cybertwitter: using Twitter to generate alerts for cybersecurity threats and vulnerabilities
– year: 2020
  ident: 10.1016/j.cose.2023.103430_br0100
  article-title: Language models are few-shot learners
– start-page: 2225
  year: 2021
  ident: 10.1016/j.cose.2023.103430_br0490
  article-title: GPT3Mix: leveraging large-scale language models for text augmentation
– start-page: 1
  issn: 2161-4407
  year: 2020
  ident: 10.1016/j.cose.2023.103430_br0140
  article-title: Towards end-to-end cyberthreat detection from Twitter using multi-task learning
– volume: 21
  start-page: 1
  year: 2020
  ident: 10.1016/j.cose.2023.103430_br0330
  article-title: Transfer learning
  publication-title: Learn.
– ident: 10.1016/j.cose.2023.103430_br0350
– ident: 10.1016/j.cose.2023.103430_br0080
– ident: 10.1016/j.cose.2023.103430_br0270
– year: 2018
  ident: 10.1016/j.cose.2023.103430_br0060
  article-title: Synthetic and natural noise both break neural machine translation
– year: 2022
  ident: 10.1016/j.cose.2023.103430_br0050
  article-title: A survey on data augmentation for text classification
  publication-title: ACM Comput. Surv.
– ident: 10.1016/j.cose.2023.103430_br0150
– volume: 144
  start-page: 74
  year: 2023
  ident: 10.1016/j.cose.2023.103430_br0110
  article-title: The rise of GoodFATR: a novel accuracy comparison methodology for indicator extraction tools
  publication-title: Future Gener. Comput. Syst.
  doi: 10.1016/j.future.2023.02.012
– start-page: 103
  year: 2017
  ident: 10.1016/j.cose.2023.103430_br0180
  article-title: TTPDrill: automatic and accurate extraction of threat actions from unstructured text of CTI sources
– start-page: 217
  year: 2020
  ident: 10.1016/j.cose.2023.103430_br0020
  article-title: Follow the blue bird: a study on threat data published on Twitter
– year: 2021
  ident: 10.1016/j.cose.2023.103430_br0040
  article-title: Data augmentation in natural language processing: a novel text generation approach for long and short text classifiers
  publication-title: Int. J. Mach. Learn. Cybern.
– volume: 5
  start-page: 1
  issue: CSCW2
  year: 2021
  ident: 10.1016/j.cose.2023.103430_br0360
  article-title: The impact of organizational structure and technology use on collaborative practices in computer emergency response teams: an empirical study
  publication-title: Proc. ACM Hum.-Comput. Interact.
  doi: 10.1145/3479865
– volume: 87
  year: 2019
  ident: 10.1016/j.cose.2023.103430_br0470
  article-title: Cyber threat intelligence sharing: survey and research directions
  publication-title: Comput. Secur.
  doi: 10.1016/j.cose.2019.101589
– year: 2020
  ident: 10.1016/j.cose.2023.103430_br0420
  article-title: Mixup-transfomer: dynamic data augmentation for NLP tasks
– year: 2020
  ident: 10.1016/j.cose.2023.103430_br0260
  article-title: How effective is task-agnostic data augmentation for pretrained transformers?
– ident: 10.1016/j.cose.2023.103430_br0170
– volume: vol. 12319 LNAI
  start-page: 551
  year: 2020
  ident: 10.1016/j.cose.2023.103430_br0340
  article-title: Pre-trained data augmentation for text classification
– volume: 72
  start-page: 212
  year: 2018
  ident: 10.1016/j.cose.2023.103430_br0460
  article-title: A survey on technical threat intelligence in the age of sophisticated cyber attacks
  publication-title: Comput. Secur.
  doi: 10.1016/j.cose.2017.09.001
– start-page: 429
  year: 2021
  ident: 10.1016/j.cose.2023.103430_br0370
  article-title: CySecAlert: an alert generation system for cyber security events using open source intelligence data
– volume: 10
  start-page: 371
  issue: 1
  year: 2018
  ident: 10.1016/j.cose.2023.103430_br0010
  article-title: Cyber threat intelligence–issue and challenges
  publication-title: Indones. J. Electr. Eng. Comput. Sci.
– start-page: 2177
  year: 2020
  ident: 10.1016/j.cose.2023.103430_br0190
  article-title: SMART: robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization
– volume: 2268
  start-page: 104
  year: 2018
  ident: 10.1016/j.cose.2023.103430_br0310
  article-title: Text augmentation for neural networks
  publication-title: CEUR Workshop Proc.
– ident: 10.1016/j.cose.2023.103430_br0400
– year: 2019
  ident: 10.1016/j.cose.2023.103430_br0250
– ident: 10.1016/j.cose.2023.103430_br0500
– volume: 38
  start-page: 22
  issue: 1
  year: 2020
  ident: 10.1016/j.cose.2023.103430_br0210
  article-title: Sharing of cyber threat intelligence between states
  publication-title: Sicherh. Frieden
  doi: 10.5771/0175-274X-2020-1-22
– year: 2019
  ident: 10.1016/j.cose.2023.103430_br0240
  article-title: BioBERT: a pre-trained biomedical language representation model for biomedical text mining
  publication-title: Bioinformatics
– volume: 30
  start-page: 415
  issue: 4
  year: 1953
  ident: 10.1016/j.cose.2023.103430_br0440
  article-title: “Cloze procedure”: a new tool for measuring readability
  publication-title: Journal. Quart.
– start-page: 7203
  year: 2020
  ident: 10.1016/j.cose.2023.103430_br0280
  article-title: CamemBERT: a tasty French language model
– year: 2019
  ident: 10.1016/j.cose.2023.103430_br0480
  article-title: EDA: easy data augmentation techniques for boosting performance on text classification tasks
– ident: 10.1016/j.cose.2023.103430_br0130
– start-page: 4747
  year: 2019
  ident: 10.1016/j.cose.2023.103430_br0320
  article-title: IoCMiner: automatic extraction of indicators of compromise from Twitter
– ident: 10.1016/j.cose.2023.103430_br0430
– year: 2017
  ident: 10.1016/j.cose.2023.103430_br0230
  article-title: Sonar: automatic detection of cyber security events over the Twitter stream
– ident: 10.1016/j.cose.2023.103430_br0090
– start-page: 1041
  year: 2015
  ident: 10.1016/j.cose.2023.103430_br0390
  article-title: Vulnerability disclosure in the age of social media: exploiting Twitter for predicting real-world exploits
– ident: 10.1016/j.cose.2023.103430_br0070
– volume: 193
  year: 2020
  ident: 10.1016/j.cose.2023.103430_br0120
  article-title: An iterative learning and inference approach to managing dynamic cyber vulnerabilities of complex systems
  publication-title: Reliab. Eng. Syst. Saf.
  doi: 10.1016/j.ress.2019.106664
– volume: vol. 2
  start-page: 502
  year: 2019
  ident: 10.1016/j.cose.2023.103430_br0380
  article-title: Generating real time cyber situational awareness information through social media data mining
– ident: 10.1016/j.cose.2023.103430_br0220
– ident: 10.1016/j.cose.2023.103430_br0290
– year: 2016
  ident: 10.1016/j.cose.2023.103430_br0410
  article-title: Improving neural machine translation models with monolingual data
SSID ssj0017688
Score 2.5028954
Snippet Gathering cyber threat intelligence from open sources is becoming increasingly important for maintaining and achieving a high level of security as systems...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 103430
SubjectTerms Cyber threat intelligence
Data augmentation
Few-shot learning
Information overload
Transfer learning
Title Multi-level fine-tuning, data augmentation, and few-shot learning for specialized cyber threat intelligence
URI https://dx.doi.org/10.1016/j.cose.2023.103430
Volume 134
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier)
  customDbUrl:
  eissn: 1872-6208
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017688
  issn: 0167-4048
  databaseCode: GBLVA
  dateStart: 20110101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Complete Freedom Collection [SCCMFC]
  customDbUrl:
  eissn: 1872-6208
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017688
  issn: 0167-4048
  databaseCode: ACRLP
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: ScienceDirect (Elsevier)
  customDbUrl:
  eissn: 1872-6208
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017688
  issn: 0167-4048
  databaseCode: .~1
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: ScienceDirect Freedom Collection Journals
  customDbUrl:
  eissn: 1872-6208
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017688
  issn: 0167-4048
  databaseCode: AIKHN
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVLSH
  databaseName: Elsevier Journals
  customDbUrl:
  mediaType: online
  eissn: 1872-6208
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017688
  issn: 0167-4048
  databaseCode: AKRWK
  dateStart: 19820101
  isFulltext: true
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3PT8IwFG4IXrz424g_SA_epMC2btQjIRLUyEVJuC3t2sEUB5ERowf_dt_bOoKJ8eBxW5stX7vXb9333kfIpdDKaFimmRtIjrtVERNcxswIL3CMlq6jcUP_YRgMRvxu7I8rpFfmwqCs0sb-Iqbn0dqeaVk0W4skaT3mAnqYgECi2x7Pc7g476CLQfNrLfNwgE6LdX1vaG0TZwqNF2rCm2ggjrnnHJXQvy1OGwtOf4_sWKZIu8XD7JOKSQ_IbunCQO1LeUhe8hxaNkP1D42BNLJshZsdDYrqTypXk1ebX5Q2qEw1jc07W07nGbWOERMKxJUuCyP65NNoGn0ouEM2RUJJk42inUdk1L956g2YtVBgEaCSMd-XkVKai9gLOhLIH4yDjBXnLhaWg0iotVCuhovaFRo-1SL8bxoH14ELPEC1vWNSTeepOSG0ozzP04BmHHOuHK0k1nLR8EXjBsLXfo04JXZhZOuLo83FLCyFZM8h4h0i3mGBd41crfssiuoaf7b2yyEJf8yREML_H_1O_9nvjGzjUZF5eE6q2dvKXAAFyVQ9n2N1stW9vR8MvwE6jtq1
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3PT8IwFG4IHvTibyP-7MGbFFjXjXk0RIIKXISE29KuG0xxEBkxevBv972tI5oYDl7XNVu-dq9fu-97j5ArT6tQwzLNuCsFnlYFzBMyYqFnu1aoJbc0Huj3-m5nKB5GzqhEWoUXBmWVJvbnMT2L1uZK3aBZn8dx_SkT0MMEBBLdsAV6uDaEw5u4A6t9rXQeFvBpb5XgG243zplc5IWi8BpWEEfzuUAp9F-r048Vp71Ltg1VpLf52-yRUpjsk52iDAM1X-UBeclMtGyK8h8aAWtk6RJPO6oU5Z9ULsevxmCUVKlMNI3Cd7aYzFJqSkaMKTBXusgr0cefoabBh4InpBNklDT-kbXzkAzbd4NWh5kaCiwAWFLmODJQSgsvst2mBPYHAyEjJQTHzHIQCrX2FNfQqLmnYa8W4I_TyL1xORAB1bCPSDmZJeExoU1l27YGNKNICGVpJTGZi4YtDXc9RzsVYhXY-YFJMI51LqZ-oSR79hFvH_H2c7wr5HrVZ56n11h7t1MMif9rkvgQ_9f0O_lnv0uy2Rn0un73vv94SrawJbchnpFy-rYMz4GPpOoim2_fa9jcSg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multi-level+fine-tuning%2C+data+augmentation%2C+and+few-shot+learning+for+specialized+cyber+threat+intelligence&rft.jtitle=Computers+%26+security&rft.au=Bayer%2C+Markus&rft.au=Frey%2C+Tobias&rft.au=Reuter%2C+Christian&rft.date=2023-11-01&rft.issn=0167-4048&rft.volume=134&rft.spage=103430&rft_id=info:doi/10.1016%2Fj.cose.2023.103430&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_cose_2023_103430
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-4048&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-4048&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-4048&client=summon