Multi-level fine-tuning, data augmentation, and few-shot learning for specialized cyber threat intelligence
Gathering cyber threat intelligence from open sources is becoming increasingly important for maintaining and achieving a high level of security as systems become larger and more complex. However, these open sources are often subject to information overload. It is therefore useful to apply machine le...
        Saved in:
      
    
          | Published in | Computers & security Vol. 134; p. 103430 | 
|---|---|
| Main Authors | , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
            Elsevier Ltd
    
        01.11.2023
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 0167-4048 1872-6208  | 
| DOI | 10.1016/j.cose.2023.103430 | 
Cover
| Abstract | Gathering cyber threat intelligence from open sources is becoming increasingly important for maintaining and achieving a high level of security as systems become larger and more complex. However, these open sources are often subject to information overload. It is therefore useful to apply machine learning models that condense the amount of information to what is necessary. Yet, previous studies and applications have shown that existing classifiers are not able to process information about emerging cybersecurity events, such as new malware names or novel attack contexts, due to their low generalisation capability. Therefore, we propose a system to overcome this problem by training a new classifier for each new incident. Since this requires a lot of labelled data using standard training methods, we combine three different low-data regime techniques – transfer learning, data augmentation, and few-shot learning – to train a high-quality classifier from very few labelled instances. We evaluated our approach using a novel dataset derived from the Microsoft Exchange Server data breach of 2021 which was labelled by three experts. Our findings reveal an increase in F1 score of more than 21 points compared to standard training methods and more than 18 points compared to a state-of-the-art method in few-shot learning. Furthermore, the classifier trained with this method and 32 instances is only less than 5 F1 score points worse than a classifier trained with 1800 instances. | 
    
|---|---|
| AbstractList | Gathering cyber threat intelligence from open sources is becoming increasingly important for maintaining and achieving a high level of security as systems become larger and more complex. However, these open sources are often subject to information overload. It is therefore useful to apply machine learning models that condense the amount of information to what is necessary. Yet, previous studies and applications have shown that existing classifiers are not able to process information about emerging cybersecurity events, such as new malware names or novel attack contexts, due to their low generalisation capability. Therefore, we propose a system to overcome this problem by training a new classifier for each new incident. Since this requires a lot of labelled data using standard training methods, we combine three different low-data regime techniques – transfer learning, data augmentation, and few-shot learning – to train a high-quality classifier from very few labelled instances. We evaluated our approach using a novel dataset derived from the Microsoft Exchange Server data breach of 2021 which was labelled by three experts. Our findings reveal an increase in F1 score of more than 21 points compared to standard training methods and more than 18 points compared to a state-of-the-art method in few-shot learning. Furthermore, the classifier trained with this method and 32 instances is only less than 5 F1 score points worse than a classifier trained with 1800 instances. | 
    
| ArticleNumber | 103430 | 
    
| Author | Frey, Tobias Bayer, Markus Reuter, Christian  | 
    
| Author_xml | – sequence: 1 givenname: Markus orcidid: 0000-0002-2040-5609 surname: Bayer fullname: Bayer, Markus email: bayer@peasec.tu-darmstadt.de – sequence: 2 givenname: Tobias surname: Frey fullname: Frey, Tobias email: tobiasjonathan.frey@stud.tu-darmstadt.de – sequence: 3 givenname: Christian orcidid: 0000-0003-1920-038X surname: Reuter fullname: Reuter, Christian email: reuter@peasec.tu-darmstadt.de  | 
    
| BookMark | eNp9kMtOwzAQRS1UJNrCD7DyBzTFcZzEldigipdUxAbWlmOPWxfXqWy3qHw9icKKRVcjzdwz0j0TNPKtB4RuczLPSV7dbeeqjTCnhBbdomAFuUDjnNc0qyjhIzTuQnXGCONXaBLjlpC8rjgfo6-3g0s2c3AEh431kKWDt349w1omieVhvQOfZLKtn2HpNTbwncVNm7ADGfokNm3AcQ_KSmd_QGN1aiDgtAkgE7Y-gXN2DV7BNbo00kW4-ZtT9Pn0-LF8yVbvz6_Lh1WmCkJSVpZSNY1m3BRVLfOSMGmkaRijeUVqXlda84bq7qgp14xxxUrOTLWoKF0UDSmmiA9_VWhjDGCEskOFFKR1Iieilya2opcmemlikNah9B-6D3Ynw-k8dD9A0JU6WggiKtsX1jaASkK39hz-CydLiUE | 
    
| CitedBy_id | crossref_primary_10_1109_ACCESS_2025_3531659 crossref_primary_10_1016_j_energy_2024_131467 crossref_primary_10_1515_icom_2024_0002 crossref_primary_10_1080_01402390_2024_2447306 crossref_primary_10_1080_23311975_2024_2395430 crossref_primary_10_1145_3675392 crossref_primary_10_2196_51433 crossref_primary_10_1016_j_cose_2024_104278 crossref_primary_10_1109_ACCESS_2024_3448247 crossref_primary_10_1016_j_cose_2024_104016  | 
    
| Cites_doi | 10.3390/app10175922 10.1016/j.future.2023.02.012 10.1145/3479865 10.1016/j.cose.2019.101589 10.1016/j.cose.2017.09.001 10.5771/0175-274X-2020-1-22 10.1016/j.ress.2019.106664  | 
    
| ContentType | Journal Article | 
    
| Copyright | 2023 Elsevier Ltd | 
    
| Copyright_xml | – notice: 2023 Elsevier Ltd | 
    
| DBID | AAYXX CITATION  | 
    
| DOI | 10.1016/j.cose.2023.103430 | 
    
| DatabaseName | CrossRef | 
    
| DatabaseTitle | CrossRef | 
    
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Computer Science | 
    
| EISSN | 1872-6208 | 
    
| ExternalDocumentID | 10_1016_j_cose_2023_103430 S0167404823003401  | 
    
| GroupedDBID | --K --M -~X .DC .~1 0R~ 1B1 1RT 1~. 1~5 29F 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABFSI ABMAC ABXDB ABYKQ ACDAQ ACGFO ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADHUB ADJOM ADMUD AEBSH AEKER AENEX AFFNX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BKOJK BKOMP BLXMC CS3 DU5 E.L EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q GBLVA GBOLZ HLX HLZ HVGLF HZ~ IHE J1W KOM LG8 LG9 M41 MO0 MS~ N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. PQQKQ Q38 R2- RIG RNS ROL RPZ RXW SBC SBM SDF SDG SDP SES SEW SPC SPCBC SSV SSZ T5K TAE TN5 TWZ WH7 WUQ XJE XPP XSW YK3 ZMT ~G- AATTM AAXKI AAYWO AAYXX ABJNI ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD  | 
    
| ID | FETCH-LOGICAL-c300t-55acbbd48f367a1504afafb4421607876dd8b2d367d28d448c4584f6962293b03 | 
    
| IEDL.DBID | .~1 | 
    
| ISSN | 0167-4048 | 
    
| IngestDate | Thu Apr 24 23:12:06 EDT 2025 Thu Oct 02 04:36:52 EDT 2025 Fri Feb 23 02:35:10 EST 2024  | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Keywords | Data augmentation Information overload Cyber threat intelligence Few-shot learning Transfer learning  | 
    
| Language | English | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-c300t-55acbbd48f367a1504afafb4421607876dd8b2d367d28d448c4584f6962293b03 | 
    
| ORCID | 0000-0002-2040-5609 0000-0003-1920-038X  | 
    
| ParticipantIDs | crossref_citationtrail_10_1016_j_cose_2023_103430 crossref_primary_10_1016_j_cose_2023_103430 elsevier_sciencedirect_doi_10_1016_j_cose_2023_103430  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | November 2023 2023-11-00  | 
    
| PublicationDateYYYYMMDD | 2023-11-01 | 
    
| PublicationDate_xml | – month: 11 year: 2023 text: November 2023  | 
    
| PublicationDecade | 2020 | 
    
| PublicationTitle | Computers & security | 
    
| PublicationYear | 2023 | 
    
| Publisher | Elsevier Ltd | 
    
| Publisher_xml | – name: Elsevier Ltd | 
    
| References | Mittal, Das, Mulwad, Joshi, Finin (br0300) 2016 Bayer, Kaufhold, Buchhold, Keller, Dallmeyer, Reuter (br0040) 2021 Beltagy, Lo, Cohan (br0070) 2019 Sabottke, Suciu, Dumitras (br0390) 2015 Taylor (br0440) 1953; 30 Anaby-Tavor, Carmeli, Goldbraich, Kantor, Kour, Shlomov, Tepper, Zwerdling (br0030) 2020 Sun, Xia, Yin, Liang, Yu, He (br0420) 2020 Reimers, Gurevych (br0350) 2019 Martin, Muller, Ortiz Suárez, Dupont, Romary, de la Clergerie, Seddah, Sagot (br0280) 2020 Longpre, Wang, DuBois (br0260) 2020 Lan, Chen, Goodman, Gimpel, Sharma, Soricut (br0220) 2020 Kuehn, Riebe, Apelt, Jansen, Reuter (br0210) 2020; 38 Yoo, Park, Kang, Lee, Park (br0490) 2021 Chatterjee, Thekdi (br0120) 2020; 193 Pan (br0330) 2020; 21 Abu, Selamat, Ariffin, Yusof (br0010) 2018; 10 Alves, Andongabo, Gashi, Ferreira, Bessani (br0020) 2020 Mosolova, Fomin, Bondarenko (br0310) 2018; 2268 Rodriguez, Okamura (br0380) 2019; vol. 2 Sennrich, Haddow, Birch (br0410) 2016 Devlin, Chang, Lee, Toutanova (br0130) 2018 Gao, Fisch, Chen (br0170) 2021 Kaufhold, Basyurt, Eyilmez, Ag, Stöttinger, Reuter, Sercan (br0200) 2022 Jiang, He, Chen, Liu, Gao, Zhao (br0190) 2020 Caballero, Gomez, Matic, Sánchez, Sebastián, Villacañas (br0110) 2023; 144 Fabbri, Han, Li, Li, Ghazvininejad, Joty, Radev, Mehdad (br0150) 2021 Tounsi, Rais (br0460) 2018; 72 Belinkov, Bisk (br0060) 2018 Liu, Ott, Goyal, Du, Joshi, Chen, Levy, Lewis, Zettlemoyer, Stoyanov, Allen (br0250) 2019 Torrey, Shavlik (br0450) 2010 Schick, Schütze (br0400) 2021 Le Sceller, Karbab, Debbabi, Iqbal (br0230) 2017 Bayer, Kaufhold, Reuter (br0050) 2022 Wagner, Mahbub, Palomar, Abdallah (br0470) 2019; 87 Bragg, Cohan, Lo, Beltagy (br0090) 2021 Mahabadi, Zettlemoyer, Henderson, Saeidi, Mathias, Stoyanov, Yazdani (br0270) 2022 Brown, Mann, Ryder, Subbiah, Kaplan, Dhariwal, Neelakantan, Shyam, Sastry, Askell, Agarwal, Herbert-Voss, Krueger, Henighan, Child, Ramesh, Ziegler, Wu, Winter, Hesse, Chen, Sigler, Litwin, Gray, Chess, Clark, Berner, McCandlish, Radford, Sutskever, Amodei (br0100) 2020 Niakanlahiji, Safarnejad, Harper, Chu (br0320) 2019 Tam, Menon, Bansal, Srivastava, Raffel (br0430) 2021 Zhang, Li, Chen, Deng, Bi, Tan, Huang, Chen (br0500) 2022 Wei, Zou (br0480) 2019 Fang, Gao, Liu, Huang (br0160) 2020; 10 Husari, Al-Shaer, Ahmed, Chu, Niu (br0180) 2017 Riebe, Wirth, Bayer, Kühn, Kaufhold, Knauthe, Guthe, Reuter (br0370) 2021 Lee, Yoon, Kim, Kim, Kim, So, Kang (br0240) 2019 Riebe, Kaufhold, Reuter (br0360) 2021; 5 McMillan (br0290) 2013 Queiroz Abonizio, Barbon (br0340) 2020; vol. 12319 LNAI Dionísio, Alves, Ferreira, Bessani (br0140) 2020 Black, Biderman, Hallahan, Anthony, Gao, Golding, He, Leahy, McDonell, Phang, Pieler, Prashanth, Purohit, Reynolds, Tow, Wang, Weinbach (br0080) 2022 Dionísio (10.1016/j.cose.2023.103430_br0140) 2020 Gao (10.1016/j.cose.2023.103430_br0170) Belinkov (10.1016/j.cose.2023.103430_br0060) 2018 Riebe (10.1016/j.cose.2023.103430_br0360) 2021; 5 Black (10.1016/j.cose.2023.103430_br0080) Lan (10.1016/j.cose.2023.103430_br0220) Kaufhold (10.1016/j.cose.2023.103430_br0200) 2022 Mittal (10.1016/j.cose.2023.103430_br0300) 2016 Mosolova (10.1016/j.cose.2023.103430_br0310) 2018; 2268 Sabottke (10.1016/j.cose.2023.103430_br0390) 2015 Lee (10.1016/j.cose.2023.103430_br0240) 2019 Mahabadi (10.1016/j.cose.2023.103430_br0270) Bragg (10.1016/j.cose.2023.103430_br0090) McMillan (10.1016/j.cose.2023.103430_br0290) Anaby-Tavor (10.1016/j.cose.2023.103430_br0030) 2020 Fabbri (10.1016/j.cose.2023.103430_br0150) Jiang (10.1016/j.cose.2023.103430_br0190) 2020 Tam (10.1016/j.cose.2023.103430_br0430) Tounsi (10.1016/j.cose.2023.103430_br0460) 2018; 72 Beltagy (10.1016/j.cose.2023.103430_br0070) Longpre (10.1016/j.cose.2023.103430_br0260) 2020 Alves (10.1016/j.cose.2023.103430_br0020) 2020 Bayer (10.1016/j.cose.2023.103430_br0040) 2021 Torrey (10.1016/j.cose.2023.103430_br0450) 2010 Reimers (10.1016/j.cose.2023.103430_br0350) Zhang (10.1016/j.cose.2023.103430_br0500) Chatterjee (10.1016/j.cose.2023.103430_br0120) 2020; 193 Le Sceller (10.1016/j.cose.2023.103430_br0230) 2017 Sennrich (10.1016/j.cose.2023.103430_br0410) 2016 Husari (10.1016/j.cose.2023.103430_br0180) 2017 Taylor (10.1016/j.cose.2023.103430_br0440) 1953; 30 Abu (10.1016/j.cose.2023.103430_br0010) 2018; 10 Caballero (10.1016/j.cose.2023.103430_br0110) 2023; 144 Liu (10.1016/j.cose.2023.103430_br0250) 2019 Devlin (10.1016/j.cose.2023.103430_br0130) Riebe (10.1016/j.cose.2023.103430_br0370) 2021 Sun (10.1016/j.cose.2023.103430_br0420) 2020 Yoo (10.1016/j.cose.2023.103430_br0490) 2021 Brown (10.1016/j.cose.2023.103430_br0100) 2020 Niakanlahiji (10.1016/j.cose.2023.103430_br0320) 2019 Kuehn (10.1016/j.cose.2023.103430_br0210) 2020; 38 Rodriguez (10.1016/j.cose.2023.103430_br0380) 2019; vol. 2 Wagner (10.1016/j.cose.2023.103430_br0470) 2019; 87 Wei (10.1016/j.cose.2023.103430_br0480) 2019 Queiroz Abonizio (10.1016/j.cose.2023.103430_br0340) 2020; vol. 12319 LNAI Pan (10.1016/j.cose.2023.103430_br0330) 2020; 21 Martin (10.1016/j.cose.2023.103430_br0280) 2020 Fang (10.1016/j.cose.2023.103430_br0160) 2020; 10 Schick (10.1016/j.cose.2023.103430_br0400) Bayer (10.1016/j.cose.2023.103430_br0050) 2022  | 
    
| References_xml | – volume: 2268 start-page: 104 year: 2018 end-page: 109 ident: br0310 article-title: Text augmentation for neural networks publication-title: CEUR Workshop Proc. – start-page: 2177 year: 2020 end-page: 2190 ident: br0190 article-title: SMART: robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization publication-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics – year: 2013 ident: br0290 article-title: Definition: threat intelligence – start-page: 1041 year: 2015 end-page: 1056 ident: br0390 article-title: Vulnerability disclosure in the age of social media: exploiting Twitter for predicting real-world exploits publication-title: 24th USENIX Security Symposium (USENIX Security 15) – year: 2018 ident: br0130 article-title: BERT: pre-training of deep bidirectional transformers for language understanding (Mlm) – volume: 193 year: 2020 ident: br0120 article-title: An iterative learning and inference approach to managing dynamic cyber vulnerabilities of complex systems publication-title: Reliab. Eng. Syst. Saf. – year: 2017 ident: br0230 article-title: Sonar: automatic detection of cyber security events over the Twitter stream publication-title: Proceedings of the 12th International Conference on Availability, Reliability and Security – year: 2022 ident: br0050 article-title: A survey on data augmentation for text classification publication-title: ACM Comput. Surv. – year: 2021 ident: br0090 article-title: FLEX: unifying evaluation for few-shot NLP – start-page: 860 year: 2016 end-page: 867 ident: br0300 article-title: Cybertwitter: using Twitter to generate alerts for cybersecurity threats and vulnerabilities publication-title: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) – year: 2016 ident: br0410 article-title: Improving neural machine translation models with monolingual data publication-title: 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers – volume: 30 start-page: 415 year: 1953 end-page: 433 ident: br0440 article-title: “Cloze procedure”: a new tool for measuring readability publication-title: Journal. Quart. – start-page: 217 year: 2020 end-page: 236 ident: br0020 article-title: Follow the blue bird: a study on threat data published on Twitter publication-title: Computer Security – ESORICS 2020 – year: 2022 ident: br0500 article-title: Differentiable prompt makes pre-trained language models better few-shot learners – year: 2019 ident: br0240 article-title: BioBERT: a pre-trained biomedical language representation model for biomedical text mining publication-title: Bioinformatics – volume: 10 start-page: 371 year: 2018 end-page: 379 ident: br0010 article-title: Cyber threat intelligence–issue and challenges publication-title: Indones. J. Electr. Eng. Comput. Sci. – year: 2020 ident: br0100 article-title: Language models are few-shot learners publication-title: NeurIPS – volume: 72 start-page: 212 year: 2018 end-page: 233 ident: br0460 article-title: A survey on technical threat intelligence in the age of sophisticated cyber attacks publication-title: Comput. Secur. – year: 2022 ident: br0270 article-title: PERFECT: prompt-free and efficient few-shot learning with language models – start-page: 103 year: 2017 end-page: 115 ident: br0180 article-title: TTPDrill: automatic and accurate extraction of threat actions from unstructured text of CTI sources publication-title: Proceedings of the 33rd Annual Computer Security Applications Conference – year: 2022 ident: br0200 article-title: Cyber threat observatory: design and evaluation of an interactive dashboard for computer emergency response teams publication-title: ECIS 2022 – year: 2020 ident: br0260 article-title: How effective is task-agnostic data augmentation for pretrained transformers? publication-title: Findings of EMNLP – year: 2020 ident: br0420 article-title: Mixup-transfomer: dynamic data augmentation for NLP tasks – year: 2019 ident: br0480 article-title: EDA: easy data augmentation techniques for boosting performance on text classification tasks publication-title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) – year: 2022 ident: br0080 article-title: GPT-NeoX-20B: an open-source autoregressive language model – year: 2018 ident: br0060 article-title: Synthetic and natural noise both break neural machine translation publication-title: Proceedings of ICLR – start-page: 4747 year: 2019 end-page: 4754 ident: br0320 article-title: IoCMiner: automatic extraction of indicators of compromise from Twitter publication-title: 2019 IEEE International Conference on Big Data (Big Data) – start-page: 2225 year: 2021 end-page: 2239 ident: br0490 article-title: GPT3Mix: leveraging large-scale language models for text augmentation publication-title: Findings of the Association for Computational Linguistics: EMNLP 2021 – year: 2021 ident: br0170 article-title: Making pre-trained language models better few-shot learners – year: 2021 ident: br0430 article-title: Improving and simplifying pattern exploiting training – volume: 10 start-page: 5922 year: 2020 ident: br0160 article-title: Detecting cyber threat event from Twitter using IDCNN and BiLSTM publication-title: Appl. Sci. – start-page: 429 year: 2021 end-page: 446 ident: br0370 article-title: CySecAlert: an alert generation system for cyber security events using open source intelligence data publication-title: Information and Communications Security – year: 2019 ident: br0250 article-title: RoBERTa: a Robustly Optimized BERT Pretraining Approach – volume: 144 start-page: 74 year: 2023 end-page: 89 ident: br0110 article-title: The rise of GoodFATR: a novel accuracy comparison methodology for indicator extraction tools publication-title: Future Gener. Comput. Syst. – volume: 87 year: 2019 ident: br0470 article-title: Cyber threat intelligence sharing: survey and research directions publication-title: Comput. Secur. – year: 2019 ident: br0070 article-title: SciBERT: a pretrained language model for scientific text – volume: 38 start-page: 22 year: 2020 end-page: 28 ident: br0210 article-title: Sharing of cyber threat intelligence between states publication-title: Sicherh. Frieden – start-page: 1 year: 2020 end-page: 8 ident: br0140 article-title: Towards end-to-end cyberthreat detection from Twitter using multi-task learning publication-title: 2020 International Joint Conference on Neural Networks (IJCNN) – year: 2021 ident: br0400 article-title: Exploiting cloze questions for few shot text classification and natural language inference – year: 2021 ident: br0150 article-title: Improving zero and few-shot abstractive summarization with intermediate fine-tuning and data augmentation – year: 2020 ident: br0030 article-title: Do not have enough data? Deep learning to the rescue! publication-title: Proceedings of the AAAI – volume: vol. 12319 LNAI start-page: 551 year: 2020 end-page: 565 ident: br0340 article-title: Pre-trained data augmentation for text classification publication-title: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) – start-page: 7203 year: 2020 end-page: 7219 ident: br0280 article-title: CamemBERT: a tasty French language model publication-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics – year: 2019 ident: br0350 article-title: Sentence-BERT: sentence embeddings using Siamese BERT-networks – volume: 5 start-page: 1 year: 2021 end-page: 30 ident: br0360 article-title: The impact of organizational structure and technology use on collaborative practices in computer emergency response teams: an empirical study publication-title: Proc. ACM Hum.-Comput. Interact. – start-page: 242 year: 2010 end-page: 264 ident: br0450 article-title: Transfer learning publication-title: Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques – year: 2021 ident: br0040 article-title: Data augmentation in natural language processing: a novel text generation approach for long and short text classifiers publication-title: Int. J. Mach. Learn. Cybern. – volume: 21 start-page: 1 year: 2020 end-page: 2 ident: br0330 article-title: Transfer learning publication-title: Learn. – year: 2020 ident: br0220 article-title: ALBERT: a lite BERT for self-supervised learning of language representations – volume: vol. 2 start-page: 502 year: 2019 end-page: 507 ident: br0380 article-title: Generating real time cyber situational awareness information through social media data mining publication-title: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC) – year: 2022 ident: 10.1016/j.cose.2023.103430_br0200 article-title: Cyber threat observatory: design and evaluation of an interactive dashboard for computer emergency response teams – volume: 10 start-page: 5922 issue: 17 year: 2020 ident: 10.1016/j.cose.2023.103430_br0160 article-title: Detecting cyber threat event from Twitter using IDCNN and BiLSTM publication-title: Appl. Sci. doi: 10.3390/app10175922 – start-page: 242 year: 2010 ident: 10.1016/j.cose.2023.103430_br0450 article-title: Transfer learning – year: 2020 ident: 10.1016/j.cose.2023.103430_br0030 article-title: Do not have enough data? Deep learning to the rescue! – start-page: 860 year: 2016 ident: 10.1016/j.cose.2023.103430_br0300 article-title: Cybertwitter: using Twitter to generate alerts for cybersecurity threats and vulnerabilities – year: 2020 ident: 10.1016/j.cose.2023.103430_br0100 article-title: Language models are few-shot learners – start-page: 2225 year: 2021 ident: 10.1016/j.cose.2023.103430_br0490 article-title: GPT3Mix: leveraging large-scale language models for text augmentation – start-page: 1 issn: 2161-4407 year: 2020 ident: 10.1016/j.cose.2023.103430_br0140 article-title: Towards end-to-end cyberthreat detection from Twitter using multi-task learning – volume: 21 start-page: 1 year: 2020 ident: 10.1016/j.cose.2023.103430_br0330 article-title: Transfer learning publication-title: Learn. – ident: 10.1016/j.cose.2023.103430_br0350 – ident: 10.1016/j.cose.2023.103430_br0080 – ident: 10.1016/j.cose.2023.103430_br0270 – year: 2018 ident: 10.1016/j.cose.2023.103430_br0060 article-title: Synthetic and natural noise both break neural machine translation – year: 2022 ident: 10.1016/j.cose.2023.103430_br0050 article-title: A survey on data augmentation for text classification publication-title: ACM Comput. Surv. – ident: 10.1016/j.cose.2023.103430_br0150 – volume: 144 start-page: 74 year: 2023 ident: 10.1016/j.cose.2023.103430_br0110 article-title: The rise of GoodFATR: a novel accuracy comparison methodology for indicator extraction tools publication-title: Future Gener. Comput. Syst. doi: 10.1016/j.future.2023.02.012 – start-page: 103 year: 2017 ident: 10.1016/j.cose.2023.103430_br0180 article-title: TTPDrill: automatic and accurate extraction of threat actions from unstructured text of CTI sources – start-page: 217 year: 2020 ident: 10.1016/j.cose.2023.103430_br0020 article-title: Follow the blue bird: a study on threat data published on Twitter – year: 2021 ident: 10.1016/j.cose.2023.103430_br0040 article-title: Data augmentation in natural language processing: a novel text generation approach for long and short text classifiers publication-title: Int. J. Mach. Learn. Cybern. – volume: 5 start-page: 1 issue: CSCW2 year: 2021 ident: 10.1016/j.cose.2023.103430_br0360 article-title: The impact of organizational structure and technology use on collaborative practices in computer emergency response teams: an empirical study publication-title: Proc. ACM Hum.-Comput. Interact. doi: 10.1145/3479865 – volume: 87 year: 2019 ident: 10.1016/j.cose.2023.103430_br0470 article-title: Cyber threat intelligence sharing: survey and research directions publication-title: Comput. Secur. doi: 10.1016/j.cose.2019.101589 – year: 2020 ident: 10.1016/j.cose.2023.103430_br0420 article-title: Mixup-transfomer: dynamic data augmentation for NLP tasks – year: 2020 ident: 10.1016/j.cose.2023.103430_br0260 article-title: How effective is task-agnostic data augmentation for pretrained transformers? – ident: 10.1016/j.cose.2023.103430_br0170 – volume: vol. 12319 LNAI start-page: 551 year: 2020 ident: 10.1016/j.cose.2023.103430_br0340 article-title: Pre-trained data augmentation for text classification – volume: 72 start-page: 212 year: 2018 ident: 10.1016/j.cose.2023.103430_br0460 article-title: A survey on technical threat intelligence in the age of sophisticated cyber attacks publication-title: Comput. Secur. doi: 10.1016/j.cose.2017.09.001 – start-page: 429 year: 2021 ident: 10.1016/j.cose.2023.103430_br0370 article-title: CySecAlert: an alert generation system for cyber security events using open source intelligence data – volume: 10 start-page: 371 issue: 1 year: 2018 ident: 10.1016/j.cose.2023.103430_br0010 article-title: Cyber threat intelligence–issue and challenges publication-title: Indones. J. Electr. Eng. Comput. Sci. – start-page: 2177 year: 2020 ident: 10.1016/j.cose.2023.103430_br0190 article-title: SMART: robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization – volume: 2268 start-page: 104 year: 2018 ident: 10.1016/j.cose.2023.103430_br0310 article-title: Text augmentation for neural networks publication-title: CEUR Workshop Proc. – ident: 10.1016/j.cose.2023.103430_br0400 – year: 2019 ident: 10.1016/j.cose.2023.103430_br0250 – ident: 10.1016/j.cose.2023.103430_br0500 – volume: 38 start-page: 22 issue: 1 year: 2020 ident: 10.1016/j.cose.2023.103430_br0210 article-title: Sharing of cyber threat intelligence between states publication-title: Sicherh. Frieden doi: 10.5771/0175-274X-2020-1-22 – year: 2019 ident: 10.1016/j.cose.2023.103430_br0240 article-title: BioBERT: a pre-trained biomedical language representation model for biomedical text mining publication-title: Bioinformatics – volume: 30 start-page: 415 issue: 4 year: 1953 ident: 10.1016/j.cose.2023.103430_br0440 article-title: “Cloze procedure”: a new tool for measuring readability publication-title: Journal. Quart. – start-page: 7203 year: 2020 ident: 10.1016/j.cose.2023.103430_br0280 article-title: CamemBERT: a tasty French language model – year: 2019 ident: 10.1016/j.cose.2023.103430_br0480 article-title: EDA: easy data augmentation techniques for boosting performance on text classification tasks – ident: 10.1016/j.cose.2023.103430_br0130 – start-page: 4747 year: 2019 ident: 10.1016/j.cose.2023.103430_br0320 article-title: IoCMiner: automatic extraction of indicators of compromise from Twitter – ident: 10.1016/j.cose.2023.103430_br0430 – year: 2017 ident: 10.1016/j.cose.2023.103430_br0230 article-title: Sonar: automatic detection of cyber security events over the Twitter stream – ident: 10.1016/j.cose.2023.103430_br0090 – start-page: 1041 year: 2015 ident: 10.1016/j.cose.2023.103430_br0390 article-title: Vulnerability disclosure in the age of social media: exploiting Twitter for predicting real-world exploits – ident: 10.1016/j.cose.2023.103430_br0070 – volume: 193 year: 2020 ident: 10.1016/j.cose.2023.103430_br0120 article-title: An iterative learning and inference approach to managing dynamic cyber vulnerabilities of complex systems publication-title: Reliab. Eng. Syst. Saf. doi: 10.1016/j.ress.2019.106664 – volume: vol. 2 start-page: 502 year: 2019 ident: 10.1016/j.cose.2023.103430_br0380 article-title: Generating real time cyber situational awareness information through social media data mining – ident: 10.1016/j.cose.2023.103430_br0220 – ident: 10.1016/j.cose.2023.103430_br0290 – year: 2016 ident: 10.1016/j.cose.2023.103430_br0410 article-title: Improving neural machine translation models with monolingual data  | 
    
| SSID | ssj0017688 | 
    
| Score | 2.5028954 | 
    
| Snippet | Gathering cyber threat intelligence from open sources is becoming increasingly important for maintaining and achieving a high level of security as systems... | 
    
| SourceID | crossref elsevier  | 
    
| SourceType | Enrichment Source Index Database Publisher  | 
    
| StartPage | 103430 | 
    
| SubjectTerms | Cyber threat intelligence Data augmentation Few-shot learning Information overload Transfer learning  | 
    
| Title | Multi-level fine-tuning, data augmentation, and few-shot learning for specialized cyber threat intelligence | 
    
| URI | https://dx.doi.org/10.1016/j.cose.2023.103430 | 
    
| Volume | 134 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier) customDbUrl: eissn: 1872-6208 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017688 issn: 0167-4048 databaseCode: GBLVA dateStart: 20110101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Complete Freedom Collection [SCCMFC] customDbUrl: eissn: 1872-6208 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017688 issn: 0167-4048 databaseCode: ACRLP dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: ScienceDirect (Elsevier) customDbUrl: eissn: 1872-6208 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017688 issn: 0167-4048 databaseCode: .~1 dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: ScienceDirect Freedom Collection Journals customDbUrl: eissn: 1872-6208 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017688 issn: 0167-4048 databaseCode: AIKHN dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVLSH databaseName: Elsevier Journals customDbUrl: mediaType: online eissn: 1872-6208 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017688 issn: 0167-4048 databaseCode: AKRWK dateStart: 19820101 isFulltext: true providerName: Library Specific Holdings  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3PT8IwFG4IXrz424g_SA_epMC2btQjIRLUyEVJuC3t2sEUB5ERowf_dt_bOoKJ8eBxW5stX7vXb9333kfIpdDKaFimmRtIjrtVERNcxswIL3CMlq6jcUP_YRgMRvxu7I8rpFfmwqCs0sb-Iqbn0dqeaVk0W4skaT3mAnqYgECi2x7Pc7g476CLQfNrLfNwgE6LdX1vaG0TZwqNF2rCm2ggjrnnHJXQvy1OGwtOf4_sWKZIu8XD7JOKSQ_IbunCQO1LeUhe8hxaNkP1D42BNLJshZsdDYrqTypXk1ebX5Q2qEw1jc07W07nGbWOERMKxJUuCyP65NNoGn0ouEM2RUJJk42inUdk1L956g2YtVBgEaCSMd-XkVKai9gLOhLIH4yDjBXnLhaWg0iotVCuhovaFRo-1SL8bxoH14ELPEC1vWNSTeepOSG0ozzP04BmHHOuHK0k1nLR8EXjBsLXfo04JXZhZOuLo83FLCyFZM8h4h0i3mGBd41crfssiuoaf7b2yyEJf8yREML_H_1O_9nvjGzjUZF5eE6q2dvKXAAFyVQ9n2N1stW9vR8MvwE6jtq1 | 
    
| linkProvider | Elsevier | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3PT8IwFG4IHvTibyP-7MGbFFjXjXk0RIIKXISE29KuG0xxEBkxevBv972tI5oYDl7XNVu-dq9fu-97j5ArT6tQwzLNuCsFnlYFzBMyYqFnu1aoJbc0Huj3-m5nKB5GzqhEWoUXBmWVJvbnMT2L1uZK3aBZn8dx_SkT0MMEBBLdsAV6uDaEw5u4A6t9rXQeFvBpb5XgG243zplc5IWi8BpWEEfzuUAp9F-r048Vp71Ltg1VpLf52-yRUpjsk52iDAM1X-UBeclMtGyK8h8aAWtk6RJPO6oU5Z9ULsevxmCUVKlMNI3Cd7aYzFJqSkaMKTBXusgr0cefoabBh4InpBNklDT-kbXzkAzbd4NWh5kaCiwAWFLmODJQSgsvst2mBPYHAyEjJQTHzHIQCrX2FNfQqLmnYa8W4I_TyL1xORAB1bCPSDmZJeExoU1l27YGNKNICGVpJTGZi4YtDXc9RzsVYhXY-YFJMI51LqZ-oSR79hFvH_H2c7wr5HrVZ56n11h7t1MMif9rkvgQ_9f0O_lnv0uy2Rn0un73vv94SrawJbchnpFy-rYMz4GPpOoim2_fa9jcSg | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multi-level+fine-tuning%2C+data+augmentation%2C+and+few-shot+learning+for+specialized+cyber+threat+intelligence&rft.jtitle=Computers+%26+security&rft.au=Bayer%2C+Markus&rft.au=Frey%2C+Tobias&rft.au=Reuter%2C+Christian&rft.date=2023-11-01&rft.issn=0167-4048&rft.volume=134&rft.spage=103430&rft_id=info:doi/10.1016%2Fj.cose.2023.103430&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_cose_2023_103430 | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-4048&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-4048&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-4048&client=summon |