ChatGPT vs. Human Annotators: A Comprehensive Analysis of ChatGPT for Text Annotation

In recent years, the field of Natural Language Processing (NLP) has witnessed a groundbreaking transformation with the emergence of large language models (LLMs). ChatGPT stands out as an example among these LLM models captivating considerable public interest due to its impressive language generation...

Full description

Saved in:
Bibliographic Details
Published inProceedings (IEEE International Conference on Emerging Technologies and Factory Automation) pp. 602 - 609
Main Authors Aldeen, Mohammed, Luo, Joshua, Lian, Ashley, Zheng, Venus, Hong, Allen, Yetukuri, Preethika, Cheng, Long
Format Conference Proceeding
LanguageEnglish
Published IEEE 15.12.2023
Subjects
Online AccessGet full text
ISSN1946-0759
DOI10.1109/ICMLA58977.2023.00089

Cover

Abstract In recent years, the field of Natural Language Processing (NLP) has witnessed a groundbreaking transformation with the emergence of large language models (LLMs). ChatGPT stands out as an example among these LLM models captivating considerable public interest due to its impressive language generation capabilities. Researchers have been exploring the potential of using ChatGPT for data annotation tasks, aiming to discover more timesaving and cost-effective approaches. In this paper, we present a comprehensive evaluation of ChatGPT's data annotation capabilities across ten diverse datasets covering various subject areas and varied number of classes. To ensure the quality of our evaluation, we leveraged datasets that were previously annotated by human experts, providing a reliable benchmark for comparison. Through rigorous experimentation, we assessed the impact of different prompt strategies and model configurations on the annotation performance. Our findings emphasize the capability of ChatGPT in handling most data annotation tasks achieving average accuracy of 78.2% across various tasks. The banking queries dataset stands out with an impressive 95.9% accuracy, while emotions classification presents challenges, yielding an accuracy of 57.5%. Our evaluation also highlights the impact of prompt strategies on annotation performance and reveals significant performance differences between GPT models, with "gpt-4" achieving higher accuracy 79.2% on average compared to "gpt-3.5" of 74.6%. Our research provides valuable insights into the capabilities and limitations of ChatGPT in automating data annotation tasks.
AbstractList In recent years, the field of Natural Language Processing (NLP) has witnessed a groundbreaking transformation with the emergence of large language models (LLMs). ChatGPT stands out as an example among these LLM models captivating considerable public interest due to its impressive language generation capabilities. Researchers have been exploring the potential of using ChatGPT for data annotation tasks, aiming to discover more timesaving and cost-effective approaches. In this paper, we present a comprehensive evaluation of ChatGPT's data annotation capabilities across ten diverse datasets covering various subject areas and varied number of classes. To ensure the quality of our evaluation, we leveraged datasets that were previously annotated by human experts, providing a reliable benchmark for comparison. Through rigorous experimentation, we assessed the impact of different prompt strategies and model configurations on the annotation performance. Our findings emphasize the capability of ChatGPT in handling most data annotation tasks achieving average accuracy of 78.2% across various tasks. The banking queries dataset stands out with an impressive 95.9% accuracy, while emotions classification presents challenges, yielding an accuracy of 57.5%. Our evaluation also highlights the impact of prompt strategies on annotation performance and reveals significant performance differences between GPT models, with "gpt-4" achieving higher accuracy 79.2% on average compared to "gpt-3.5" of 74.6%. Our research provides valuable insights into the capabilities and limitations of ChatGPT in automating data annotation tasks.
Author Aldeen, Mohammed
Luo, Joshua
Zheng, Venus
Lian, Ashley
Hong, Allen
Cheng, Long
Yetukuri, Preethika
Author_xml – sequence: 1
  givenname: Mohammed
  surname: Aldeen
  fullname: Aldeen, Mohammed
  email: mshujaa@clemson.edu
  organization: School of Computing, Clemson University
– sequence: 2
  givenname: Joshua
  surname: Luo
  fullname: Luo, Joshua
  email: joshualuo@westminster.net
  organization: The Westminster Schools,Atlanta,Georgia
– sequence: 3
  givenname: Ashley
  surname: Lian
  fullname: Lian, Ashley
  email: lianashley912@gmail.com
  organization: SC Governor's School for Science and Mathematics
– sequence: 4
  givenname: Venus
  surname: Zheng
  fullname: Zheng, Venus
  email: vzheng0814@gmail.com
  organization: SC Governor's School for Science and Mathematics
– sequence: 5
  givenname: Allen
  surname: Hong
  fullname: Hong, Allen
  email: 25allenhong@pickens.k12.sc.us
  organization: D.W. Daniel High School,Central,South Carolina
– sequence: 6
  givenname: Preethika
  surname: Yetukuri
  fullname: Yetukuri, Preethika
  email: pyetuku@clemson.edu
  organization: School of Mathematical and Statistical Sciences, Clemson University
– sequence: 7
  givenname: Long
  surname: Cheng
  fullname: Cheng, Long
  email: lcheng2@clemson.edu
  organization: School of Computing, Clemson University
BookMark eNo1ztFKwzAUBuAoCs65N1DIC7Se9KRNjnel6DaY6MV2Pdo0YZUtGU0d7u0t6K7Oxf9_P-ee3fjgLWNPAlIhgJ6X1fuqzDUplWaQYQoAmq7YjBRpzAFljrK4ZhNBskhA5XTHZjF-jbVRF4Q0YZtqVw_zzzU_xZQvvg-156X3YaiH0McXXvIqHI693Vkfu5Mds3p_jl3kwfGLdKHna_szXGAX_AO7dfU-2tn_nbLN2-u6WiSrj_myKldJJwQNCTmkFgkdSLB544xrrMQiU8o651rZaDTF-K0xxqHToKAho4HaRucShcEpe_zb7ay122PfHer-vBUgRyUQfwGinVP4
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICMLA58977.2023.00089
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350345346
EISSN 1946-0759
EndPage 609
ExternalDocumentID 10460013
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i119t-9f39d393f040e5bfcfbe436277efffd4b83c6000cccf3f8070b9c809db85431c3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:17:08 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i119t-9f39d393f040e5bfcfbe436277efffd4b83c6000cccf3f8070b9c809db85431c3
PageCount 8
ParticipantIDs ieee_primary_10460013
PublicationCentury 2000
PublicationDate 2023-Dec.-15
PublicationDateYYYYMMDD 2023-12-15
PublicationDate_xml – month: 12
  year: 2023
  text: 2023-Dec.-15
  day: 15
PublicationDecade 2020
PublicationTitle Proceedings (IEEE International Conference on Emerging Technologies and Factory Automation)
PublicationTitleAbbrev ICMLA
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001096939
Score 2.243143
Snippet In recent years, the field of Natural Language Processing (NLP) has witnessed a groundbreaking transformation with the emergence of large language models...
SourceID ieee
SourceType Publisher
StartPage 602
SubjectTerms Annotations
Banking
Benchmark testing
ChaptGPT
Chatbots
Data Annotation
Data models
Large Language Models
Machine learning
Reliability
Title ChatGPT vs. Human Annotators: A Comprehensive Analysis of ChatGPT for Text Annotation
URI https://ieeexplore.ieee.org/document/10460013
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5uJ08qTvxNDl7brUvSJt7GcE5xY4cNdhtN8sJEaGVrPfjXm9euTgTBW1MIDXn0vS8v7_seIXeJSHnC-iwAJnWAEuhB2k-sB3LcSRYLBQ7zHZNpPF7w56VY7sjqFRcGAKriMwjxsbrLt7kpMVXWxftIxCwt0kpkXJO19gkVD8YVUzuWjh91n4aTl4GQHuGE2CQcpQqxm_uPLipVEBkdkWnz-bp25C0sCx2az1_KjP9e3zHp7Pl6dPYdiU7IAWSnZDFcp8XjbE4_tiGtcvV0kGV5gafs7T0dUHQFG1jXFey0USehuaPNTA9o6dx772ait2GHLEYP8-E42DVRCF6jSBWBckxZppjzfysI7YzTwH3UShJwzlmuJTN-0T1jjGNOeg-glZE9ZbVEmrxhZ6Sd5RmcE2qkj-2RjZUFziECLfopquUwqxT3sOaCdHBPVu-1Tsaq2Y7LP95fkUO0CxaHROKatItNCTc-xBf6tjLtF6XupEc
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA46D3pSceJvc_Dabl2SNfE2hnPTbezQwW5jSV5QhFa2zoN_vXnt6kQQvLWFwCOPvPf15X3fI-QuFgsesxYLgEkdoAR6sGjF1gM57iRrCwUO6x2jcbs_5U8zMduQ1QsuDAAUzWcQ4mNxl28zs8ZSWQPvIxGz7JI9wTkXJV1rW1LxcFwxteHp-LfGoDsadoT0GCfEMeEoVojz3H_MUSnSSO-QjCsDyu6Rt3Cd69B8_tJm_LeFR6S-ZezRyXcuOiY7kJ6QafdlkT9OEvqxCmlRraedNM1y_M9e3dMOxWCwhJeyh51W-iQ0c7Ra6SEtTXz8rhZ6L9bJtPeQdPvBZoxC8BpFKg-UY8oyxZw_ryC0M04D93krjsE5Z7mWzHijm8YYx5z0MUArI5vKaolEecNOSS3NUjgj1Eif3SPbVhY4hwi0aC1QL4dZpbgHNuekjnsyfy-VMubVdlz88f2W7PeT0XA-HIyfL8kB-ghbRSJxRWr5cg3XPuHn-qZw8xdx7aeU
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+International+Conference+on+Emerging+Technologies+and+Factory+Automation%29&rft.atitle=ChatGPT+vs.+Human+Annotators%3A+A+Comprehensive+Analysis+of+ChatGPT+for+Text+Annotation&rft.au=Aldeen%2C+Mohammed&rft.au=Luo%2C+Joshua&rft.au=Lian%2C+Ashley&rft.au=Zheng%2C+Venus&rft.date=2023-12-15&rft.pub=IEEE&rft.eissn=1946-0759&rft.spage=602&rft.epage=609&rft_id=info:doi/10.1109%2FICMLA58977.2023.00089&rft.externalDocID=10460013