Building a business email compromise research dataset with large language models
Email-based attacks, such as Business Email Compromise, seriously threaten many organizations. In recent years, Large Language Models have improved the potency of email-based attacks by giving attackers an easy-to-use tool to overcome the language barrier and craft believable emails. At the same tim...
Saved in:
Published in | Journal of Computer Virology and Hacking Techniques Vol. 21; no. 1; p. 3 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Paris
Springer Paris
02.01.2025
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
ISSN | 2263-8733 2263-8733 |
DOI | 10.1007/s11416-024-00544-y |
Cover
Summary: | Email-based attacks, such as Business Email Compromise, seriously threaten many organizations. In recent years, Large Language Models have improved the potency of email-based attacks by giving attackers an easy-to-use tool to overcome the language barrier and craft believable emails. At the same time, Business Email Compromise research remains hamstrung by the lack of a publicly available dataset. This paper proposes a novel system composed of Large Language Models to create Business Email Compromise datasets. Two datasets are generated. The first one (BEC-1) is a small 20-email proof-of-concept dataset that demonstrates that the system produces a dataset that a human analyst finds credible. The second (BEC-2) is a larger 279-email dataset generated using the same system. BEC-2 is the first public Business Email Compromise dataset available to the email security research community. The paper also proposes an accuracy-like metric called “agreement score” to measure the quality of datasets produced. Both BEC-1 and BEC-2 have high agreement scores – 90 and 93, respectively – validating the effectiveness of the Large Language Model system. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 2263-8733 2263-8733 |
DOI: | 10.1007/s11416-024-00544-y |