ChatGPT vs. Human Annotators: A Comprehensive Analysis of ChatGPT for Text Annotation

In recent years, the field of Natural Language Processing (NLP) has witnessed a groundbreaking transformation with the emergence of large language models (LLMs). ChatGPT stands out as an example among these LLM models captivating considerable public interest due to its impressive language generation...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings (IEEE International Conference on Emerging Technologies and Factory Automation) pp. 602 - 609
Main Authors	Aldeen, Mohammed, Luo, Joshua, Lian, Ashley, Zheng, Venus, Hong, Allen, Yetukuri, Preethika, Cheng, Long
Format	Conference Proceeding
Language	English
Published	IEEE 15.12.2023
Subjects	Annotations Banking Benchmark testing ChaptGPT Chatbots Data Annotation Data models Large Language Models Machine learning Reliability
Online Access	Get full text
ISSN	1946-0759
DOI	10.1109/ICMLA58977.2023.00089

Cover

Abstract	In recent years, the field of Natural Language Processing (NLP) has witnessed a groundbreaking transformation with the emergence of large language models (LLMs). ChatGPT stands out as an example among these LLM models captivating considerable public interest due to its impressive language generation capabilities. Researchers have been exploring the potential of using ChatGPT for data annotation tasks, aiming to discover more timesaving and cost-effective approaches. In this paper, we present a comprehensive evaluation of ChatGPT's data annotation capabilities across ten diverse datasets covering various subject areas and varied number of classes. To ensure the quality of our evaluation, we leveraged datasets that were previously annotated by human experts, providing a reliable benchmark for comparison. Through rigorous experimentation, we assessed the impact of different prompt strategies and model configurations on the annotation performance. Our findings emphasize the capability of ChatGPT in handling most data annotation tasks achieving average accuracy of 78.2% across various tasks. The banking queries dataset stands out with an impressive 95.9% accuracy, while emotions classification presents challenges, yielding an accuracy of 57.5%. Our evaluation also highlights the impact of prompt strategies on annotation performance and reveals significant performance differences between GPT models, with "gpt-4" achieving higher accuracy 79.2% on average compared to "gpt-3.5" of 74.6%. Our research provides valuable insights into the capabilities and limitations of ChatGPT in automating data annotation tasks.
AbstractList	In recent years, the field of Natural Language Processing (NLP) has witnessed a groundbreaking transformation with the emergence of large language models (LLMs). ChatGPT stands out as an example among these LLM models captivating considerable public interest due to its impressive language generation capabilities. Researchers have been exploring the potential of using ChatGPT for data annotation tasks, aiming to discover more timesaving and cost-effective approaches. In this paper, we present a comprehensive evaluation of ChatGPT's data annotation capabilities across ten diverse datasets covering various subject areas and varied number of classes. To ensure the quality of our evaluation, we leveraged datasets that were previously annotated by human experts, providing a reliable benchmark for comparison. Through rigorous experimentation, we assessed the impact of different prompt strategies and model configurations on the annotation performance. Our findings emphasize the capability of ChatGPT in handling most data annotation tasks achieving average accuracy of 78.2% across various tasks. The banking queries dataset stands out with an impressive 95.9% accuracy, while emotions classification presents challenges, yielding an accuracy of 57.5%. Our evaluation also highlights the impact of prompt strategies on annotation performance and reveals significant performance differences between GPT models, with "gpt-4" achieving higher accuracy 79.2% on average compared to "gpt-3.5" of 74.6%. Our research provides valuable insights into the capabilities and limitations of ChatGPT in automating data annotation tasks.
Author	Aldeen, Mohammed Luo, Joshua Zheng, Venus Lian, Ashley Hong, Allen Cheng, Long Yetukuri, Preethika
Author_xml	– sequence: 1 givenname: Mohammed surname: Aldeen fullname: Aldeen, Mohammed email: mshujaa@clemson.edu organization: School of Computing, Clemson University – sequence: 2 givenname: Joshua surname: Luo fullname: Luo, Joshua email: joshualuo@westminster.net organization: The Westminster Schools,Atlanta,Georgia – sequence: 3 givenname: Ashley surname: Lian fullname: Lian, Ashley email: lianashley912@gmail.com organization: SC Governor's School for Science and Mathematics – sequence: 4 givenname: Venus surname: Zheng fullname: Zheng, Venus email: vzheng0814@gmail.com organization: SC Governor's School for Science and Mathematics – sequence: 5 givenname: Allen surname: Hong fullname: Hong, Allen email: 25allenhong@pickens.k12.sc.us organization: D.W. Daniel High School,Central,South Carolina – sequence: 6 givenname: Preethika surname: Yetukuri fullname: Yetukuri, Preethika email: pyetuku@clemson.edu organization: School of Mathematical and Statistical Sciences, Clemson University – sequence: 7 givenname: Long surname: Cheng fullname: Cheng, Long email: lcheng2@clemson.edu organization: School of Computing, Clemson University
BookMark	eNo1ztFKwzAUBuAoCs65N1DIC7Se9KRNjnel6DaY6MV2Pdo0YZUtGU0d7u0t6K7Oxf9_P-ee3fjgLWNPAlIhgJ6X1fuqzDUplWaQYQoAmq7YjBRpzAFljrK4ZhNBskhA5XTHZjF-jbVRF4Q0YZtqVw_zzzU_xZQvvg-156X3YaiH0McXXvIqHI693Vkfu5Mds3p_jl3kwfGLdKHna_szXGAX_AO7dfU-2tn_nbLN2-u6WiSrj_myKldJJwQNCTmkFgkdSLB544xrrMQiU8o651rZaDTF-K0xxqHToKAho4HaRucShcEpe_zb7ay122PfHer-vBUgRyUQfwGinVP4
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ICMLA58977.2023.00089
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798350345346
EISSN	1946-0759
EndPage	609
ExternalDocumentID	10460013
Genre	orig-research
GroupedDBID	6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL RNS
ID	FETCH-LOGICAL-i119t-9f39d393f040e5bfcfbe436277efffd4b83c6000cccf3f8070b9c809db85431c3
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:17:08 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i119t-9f39d393f040e5bfcfbe436277efffd4b83c6000cccf3f8070b9c809db85431c3
PageCount	8
ParticipantIDs	ieee_primary_10460013
PublicationCentury	2000
PublicationDate	2023-Dec.-15
PublicationDateYYYYMMDD	2023-12-15
PublicationDate_xml	– month: 12 year: 2023 text: 2023-Dec.-15 day: 15
PublicationDecade	2020
PublicationTitle	Proceedings (IEEE International Conference on Emerging Technologies and Factory Automation)
PublicationTitleAbbrev	ICMLA
PublicationYear	2023
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0001096939
Score	2.243143
Snippet	In recent years, the field of Natural Language Processing (NLP) has witnessed a groundbreaking transformation with the emergence of large language models...
SourceID	ieee
SourceType	Publisher
StartPage	602
SubjectTerms	Annotations Banking Benchmark testing ChaptGPT Chatbots Data Annotation Data models Large Language Models Machine learning Reliability
Title	ChatGPT vs. Human Annotators: A Comprehensive Analysis of ChatGPT for Text Annotation
URI	https://ieeexplore.ieee.org/document/10460013
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5uJ08qTvxNDl7brUvSJt7GcE5xY4cNdhtN8sJEaGVrPfjXm9euTgTBW1MIDXn0vS8v7_seIXeJSHnC-iwAJnWAEuhB2k-sB3LcSRYLBQ7zHZNpPF7w56VY7sjqFRcGAKriMwjxsbrLt7kpMVXWxftIxCwt0kpkXJO19gkVD8YVUzuWjh91n4aTl4GQHuGE2CQcpQqxm_uPLipVEBkdkWnz-bp25C0sCx2az1_KjP9e3zHp7Pl6dPYdiU7IAWSnZDFcp8XjbE4_tiGtcvV0kGV5gafs7T0dUHQFG1jXFey0USehuaPNTA9o6dx772ait2GHLEYP8-E42DVRCF6jSBWBckxZppjzfysI7YzTwH3UShJwzlmuJTN-0T1jjGNOeg-glZE9ZbVEmrxhZ6Sd5RmcE2qkj-2RjZUFziECLfopquUwqxT3sOaCdHBPVu-1Tsaq2Y7LP95fkUO0CxaHROKatItNCTc-xBf6tjLtF6XupEc
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA46D3pSceJvc_Dabl2SNfE2hnPTbezQwW5jSV5QhFa2zoN_vXnt6kQQvLWFwCOPvPf15X3fI-QuFgsesxYLgEkdoAR6sGjF1gM57iRrCwUO6x2jcbs_5U8zMduQ1QsuDAAUzWcQ4mNxl28zs8ZSWQPvIxGz7JI9wTkXJV1rW1LxcFwxteHp-LfGoDsadoT0GCfEMeEoVojz3H_MUSnSSO-QjCsDyu6Rt3Cd69B8_tJm_LeFR6S-ZezRyXcuOiY7kJ6QafdlkT9OEvqxCmlRraedNM1y_M9e3dMOxWCwhJeyh51W-iQ0c7Ra6SEtTXz8rhZ6L9bJtPeQdPvBZoxC8BpFKg-UY8oyxZw_ryC0M04D93krjsE5Z7mWzHijm8YYx5z0MUArI5vKaolEecNOSS3NUjgj1Eif3SPbVhY4hwi0aC1QL4dZpbgHNuekjnsyfy-VMubVdlz88f2W7PeT0XA-HIyfL8kB-ghbRSJxRWr5cg3XPuHn-qZw8xdx7aeU
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+International+Conference+on+Emerging+Technologies+and+Factory+Automation%29&rft.atitle=ChatGPT+vs.+Human+Annotators%3A+A+Comprehensive+Analysis+of+ChatGPT+for+Text+Annotation&rft.au=Aldeen%2C+Mohammed&rft.au=Luo%2C+Joshua&rft.au=Lian%2C+Ashley&rft.au=Zheng%2C+Venus&rft.date=2023-12-15&rft.pub=IEEE&rft.eissn=1946-0759&rft.spage=602&rft.epage=609&rft_id=info:doi/10.1109%2FICMLA58977.2023.00089&rft.externalDocID=10460013