Building a business email compromise research dataset with large language models

Email-based attacks, such as Business Email Compromise, seriously threaten many organizations. In recent years, Large Language Models have improved the potency of email-based attacks by giving attackers an easy-to-use tool to overcome the language barrier and craft believable emails. At the same tim...

Full description

Saved in:

Bibliographic Details
Published in	Journal of Computer Virology and Hacking Techniques Vol. 21; no. 1; p. 3
Main Author	Dube, Rohit
Format	Journal Article
Language	English
Published	Paris Springer Paris 02.01.2025 Springer Nature B.V
Subjects	Artificial intelligence Computer Science Cybercrime Datasets Electronic mail Electronic mail systems Language Large language models Malware Natural language processing Original Paper Proprietary Security systems Sentiment analysis URLs United States > US Business Email Compromise Email Dataset Phishing Large Language Model Natural Language Processing
Online Access	Get full text
ISSN	2263-8733 2263-8733
DOI	10.1007/s11416-024-00544-y

Cover

More Information
Summary:	Email-based attacks, such as Business Email Compromise, seriously threaten many organizations. In recent years, Large Language Models have improved the potency of email-based attacks by giving attackers an easy-to-use tool to overcome the language barrier and craft believable emails. At the same time, Business Email Compromise research remains hamstrung by the lack of a publicly available dataset. This paper proposes a novel system composed of Large Language Models to create Business Email Compromise datasets. Two datasets are generated. The first one (BEC-1) is a small 20-email proof-of-concept dataset that demonstrates that the system produces a dataset that a human analyst finds credible. The second (BEC-2) is a larger 279-email dataset generated using the same system. BEC-2 is the first public Business Email Compromise dataset available to the email security research community. The paper also proposes an accuracy-like metric called “agreement score” to measure the quality of datasets produced. Both BEC-1 and BEC-2 have high agreement scores – 90 and 93, respectively – validating the effectiveness of the Large Language Model system.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2263-8733 2263-8733
DOI:	10.1007/s11416-024-00544-y