ChatGPT vs. Human Annotators: A Comprehensive Analysis of ChatGPT for Text Annotation
In recent years, the field of Natural Language Processing (NLP) has witnessed a groundbreaking transformation with the emergence of large language models (LLMs). ChatGPT stands out as an example among these LLM models captivating considerable public interest due to its impressive language generation...
Saved in:
| Published in | Proceedings (IEEE International Conference on Emerging Technologies and Factory Automation) pp. 602 - 609 |
|---|---|
| Main Authors | , , , , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
15.12.2023
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1946-0759 |
| DOI | 10.1109/ICMLA58977.2023.00089 |
Cover
| Abstract | In recent years, the field of Natural Language Processing (NLP) has witnessed a groundbreaking transformation with the emergence of large language models (LLMs). ChatGPT stands out as an example among these LLM models captivating considerable public interest due to its impressive language generation capabilities. Researchers have been exploring the potential of using ChatGPT for data annotation tasks, aiming to discover more timesaving and cost-effective approaches. In this paper, we present a comprehensive evaluation of ChatGPT's data annotation capabilities across ten diverse datasets covering various subject areas and varied number of classes. To ensure the quality of our evaluation, we leveraged datasets that were previously annotated by human experts, providing a reliable benchmark for comparison. Through rigorous experimentation, we assessed the impact of different prompt strategies and model configurations on the annotation performance. Our findings emphasize the capability of ChatGPT in handling most data annotation tasks achieving average accuracy of 78.2% across various tasks. The banking queries dataset stands out with an impressive 95.9% accuracy, while emotions classification presents challenges, yielding an accuracy of 57.5%. Our evaluation also highlights the impact of prompt strategies on annotation performance and reveals significant performance differences between GPT models, with "gpt-4" achieving higher accuracy 79.2% on average compared to "gpt-3.5" of 74.6%. Our research provides valuable insights into the capabilities and limitations of ChatGPT in automating data annotation tasks. |
|---|---|
| AbstractList | In recent years, the field of Natural Language Processing (NLP) has witnessed a groundbreaking transformation with the emergence of large language models (LLMs). ChatGPT stands out as an example among these LLM models captivating considerable public interest due to its impressive language generation capabilities. Researchers have been exploring the potential of using ChatGPT for data annotation tasks, aiming to discover more timesaving and cost-effective approaches. In this paper, we present a comprehensive evaluation of ChatGPT's data annotation capabilities across ten diverse datasets covering various subject areas and varied number of classes. To ensure the quality of our evaluation, we leveraged datasets that were previously annotated by human experts, providing a reliable benchmark for comparison. Through rigorous experimentation, we assessed the impact of different prompt strategies and model configurations on the annotation performance. Our findings emphasize the capability of ChatGPT in handling most data annotation tasks achieving average accuracy of 78.2% across various tasks. The banking queries dataset stands out with an impressive 95.9% accuracy, while emotions classification presents challenges, yielding an accuracy of 57.5%. Our evaluation also highlights the impact of prompt strategies on annotation performance and reveals significant performance differences between GPT models, with "gpt-4" achieving higher accuracy 79.2% on average compared to "gpt-3.5" of 74.6%. Our research provides valuable insights into the capabilities and limitations of ChatGPT in automating data annotation tasks. |
| Author | Aldeen, Mohammed Luo, Joshua Zheng, Venus Lian, Ashley Hong, Allen Cheng, Long Yetukuri, Preethika |
| Author_xml | – sequence: 1 givenname: Mohammed surname: Aldeen fullname: Aldeen, Mohammed email: mshujaa@clemson.edu organization: School of Computing, Clemson University – sequence: 2 givenname: Joshua surname: Luo fullname: Luo, Joshua email: joshualuo@westminster.net organization: The Westminster Schools,Atlanta,Georgia – sequence: 3 givenname: Ashley surname: Lian fullname: Lian, Ashley email: lianashley912@gmail.com organization: SC Governor's School for Science and Mathematics – sequence: 4 givenname: Venus surname: Zheng fullname: Zheng, Venus email: vzheng0814@gmail.com organization: SC Governor's School for Science and Mathematics – sequence: 5 givenname: Allen surname: Hong fullname: Hong, Allen email: 25allenhong@pickens.k12.sc.us organization: D.W. Daniel High School,Central,South Carolina – sequence: 6 givenname: Preethika surname: Yetukuri fullname: Yetukuri, Preethika email: pyetuku@clemson.edu organization: School of Mathematical and Statistical Sciences, Clemson University – sequence: 7 givenname: Long surname: Cheng fullname: Cheng, Long email: lcheng2@clemson.edu organization: School of Computing, Clemson University |
| BookMark | eNo1ztFKwzAUBuAoCs65N1DIC7Se9KRNjnel6DaY6MV2Pdo0YZUtGU0d7u0t6K7Oxf9_P-ee3fjgLWNPAlIhgJ6X1fuqzDUplWaQYQoAmq7YjBRpzAFljrK4ZhNBskhA5XTHZjF-jbVRF4Q0YZtqVw_zzzU_xZQvvg-156X3YaiH0McXXvIqHI693Vkfu5Mds3p_jl3kwfGLdKHna_szXGAX_AO7dfU-2tn_nbLN2-u6WiSrj_myKldJJwQNCTmkFgkdSLB544xrrMQiU8o651rZaDTF-K0xxqHToKAho4HaRucShcEpe_zb7ay122PfHer-vBUgRyUQfwGinVP4 |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ICMLA58977.2023.00089 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798350345346 |
| EISSN | 1946-0759 |
| EndPage | 609 |
| ExternalDocumentID | 10460013 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL RNS |
| ID | FETCH-LOGICAL-i119t-9f39d393f040e5bfcfbe436277efffd4b83c6000cccf3f8070b9c809db85431c3 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 02:17:08 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i119t-9f39d393f040e5bfcfbe436277efffd4b83c6000cccf3f8070b9c809db85431c3 |
| PageCount | 8 |
| ParticipantIDs | ieee_primary_10460013 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-Dec.-15 |
| PublicationDateYYYYMMDD | 2023-12-15 |
| PublicationDate_xml | – month: 12 year: 2023 text: 2023-Dec.-15 day: 15 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings (IEEE International Conference on Emerging Technologies and Factory Automation) |
| PublicationTitleAbbrev | ICMLA |
| PublicationYear | 2023 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0001096939 |
| Score | 2.243143 |
| Snippet | In recent years, the field of Natural Language Processing (NLP) has witnessed a groundbreaking transformation with the emergence of large language models... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 602 |
| SubjectTerms | Annotations Banking Benchmark testing ChaptGPT Chatbots Data Annotation Data models Large Language Models Machine learning Reliability |
| Title | ChatGPT vs. Human Annotators: A Comprehensive Analysis of ChatGPT for Text Annotation |
| URI | https://ieeexplore.ieee.org/document/10460013 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5uJ08qTvxNDl7brUvSJt7GcE5xY4cNdhtN8sJEaGVrPfjXm9euTgTBW1MIDXn0vS8v7_seIXeJSHnC-iwAJnWAEuhB2k-sB3LcSRYLBQ7zHZNpPF7w56VY7sjqFRcGAKriMwjxsbrLt7kpMVXWxftIxCwt0kpkXJO19gkVD8YVUzuWjh91n4aTl4GQHuGE2CQcpQqxm_uPLipVEBkdkWnz-bp25C0sCx2az1_KjP9e3zHp7Pl6dPYdiU7IAWSnZDFcp8XjbE4_tiGtcvV0kGV5gafs7T0dUHQFG1jXFey0USehuaPNTA9o6dx772ait2GHLEYP8-E42DVRCF6jSBWBckxZppjzfysI7YzTwH3UShJwzlmuJTN-0T1jjGNOeg-glZE9ZbVEmrxhZ6Sd5RmcE2qkj-2RjZUFziECLfopquUwqxT3sOaCdHBPVu-1Tsaq2Y7LP95fkUO0CxaHROKatItNCTc-xBf6tjLtF6XupEc |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA46D3pSceJvc_Dabl2SNfE2hnPTbezQwW5jSV5QhFa2zoN_vXnt6kQQvLWFwCOPvPf15X3fI-QuFgsesxYLgEkdoAR6sGjF1gM57iRrCwUO6x2jcbs_5U8zMduQ1QsuDAAUzWcQ4mNxl28zs8ZSWQPvIxGz7JI9wTkXJV1rW1LxcFwxteHp-LfGoDsadoT0GCfEMeEoVojz3H_MUSnSSO-QjCsDyu6Rt3Cd69B8_tJm_LeFR6S-ZezRyXcuOiY7kJ6QafdlkT9OEvqxCmlRraedNM1y_M9e3dMOxWCwhJeyh51W-iQ0c7Ra6SEtTXz8rhZ6L9bJtPeQdPvBZoxC8BpFKg-UY8oyxZw_ryC0M04D93krjsE5Z7mWzHijm8YYx5z0MUArI5vKaolEecNOSS3NUjgj1Eif3SPbVhY4hwi0aC1QL4dZpbgHNuekjnsyfy-VMubVdlz88f2W7PeT0XA-HIyfL8kB-ghbRSJxRWr5cg3XPuHn-qZw8xdx7aeU |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+International+Conference+on+Emerging+Technologies+and+Factory+Automation%29&rft.atitle=ChatGPT+vs.+Human+Annotators%3A+A+Comprehensive+Analysis+of+ChatGPT+for+Text+Annotation&rft.au=Aldeen%2C+Mohammed&rft.au=Luo%2C+Joshua&rft.au=Lian%2C+Ashley&rft.au=Zheng%2C+Venus&rft.date=2023-12-15&rft.pub=IEEE&rft.eissn=1946-0759&rft.spage=602&rft.epage=609&rft_id=info:doi/10.1109%2FICMLA58977.2023.00089&rft.externalDocID=10460013 |