Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgery

Introduction: In recent times, there has been an increased number of published materials related to artificial intelligence (AI) in both the medical field, and specifically, in the domain of neurosurgery. Studies integrating AI into neurosurgical practice suggest an ongoing shift towards a greater d...

Full description

Saved in:

Bibliographic Details
Published in	Annales Academiae Medicae Silesiensis Vol. 78; pp. 253 - 258
Main Authors	Laskowski, Maciej, Ciekalski, Marcin, Laskowski, Marcin, Błaszczyk, Bartłomiej, Setlak, Marcin, Paździora, Piotr, Rudnik, Adam
Format	Journal Article
Language	English
Published	Śląski Uniwersytet Medyczny w Katowicach 15.10.2024
Subjects	artificial intelligence (ai) chatgpt neurosurgery
Online Access	Get full text
ISSN	1734-025X 0208-5607 1734-025X
DOI	10.18794/aams/186827

Cover

Abstract	Introduction: In recent times, there has been an increased number of published materials related to artificial intelligence (AI) in both the medical field, and specifically, in the domain of neurosurgery. Studies integrating AI into neurosurgical practice suggest an ongoing shift towards a greater dependence on AI-assisted tools for diagnostics, image analysis, and decision-making. Material and methods: The study evaluated the performance of ChatGPT-3.5 and ChatGPT-4 on a neurosurgery exam from Autumn 2017, which was the latest exam with officially provided answers on the Medical Examinations Center in Łódź, Poland (Centrum Egzaminów Medycznych – CEM) website. The passing score for the National Specialization Exam (Państwowy Egzamin Specjalizacyjny – PES) in Poland, as administered by CEM, is 56% of the valid questions. This exam, chosen from CEM, comprised 116 single-choice questions after eliminating four outdated questions. These questions were categorized into ten thematic groups based on the subjects they address. For data collection, both ChatGPT versions were briefed on the exam rules and asked to rate their confidence in each answer on a scale from 1 (definitely not sure) to 5 (definitely sure). All the interactions were conducted in Polish and were recorded. Results: ChatGPT-4 significantly outperformed ChatGPT-3.5, showing a notable improvement with a 29.4% margin (p < 0.001). Unlike ChatGPT-3.5, ChatGPT-4 successfully reached the passing threshold for the PES. ChatGPT-3.5 and ChatGPT-4 had the same answers in 61 questions (52.58%), both were correct in 28 questions (24.14%), and were incorrect in 33 questions (28.45%). Conclusions: ChatGPT-4 shows improved accuracy over ChatGPT-3.5, likely due to advanced algorithms and a broader training dataset, highlighting its better grasp of complex neurosurgical concepts.
AbstractList	Introduction: In recent times, there has been an increased number of published materials related to artificial intelligence (AI) in both the medical field, and specifically, in the domain of neurosurgery. Studies integrating AI into neurosurgical practice suggest an ongoing shift towards a greater dependence on AI-assisted tools for diagnostics, image analysis, and decision-making. Material and methods: The study evaluated the performance of ChatGPT-3.5 and ChatGPT-4 on a neurosurgery exam from Autumn 2017, which was the latest exam with officially provided answers on the Medical Examinations Center in Łódź, Poland (Centrum Egzaminów Medycznych – CEM) website. The passing score for the National Specialization Exam (Państwowy Egzamin Specjalizacyjny – PES) in Poland, as administered by CEM, is 56% of the valid questions. This exam, chosen from CEM, comprised 116 single-choice questions after eliminating four outdated questions. These questions were categorized into ten thematic groups based on the subjects they address. For data collection, both ChatGPT versions were briefed on the exam rules and asked to rate their confidence in each answer on a scale from 1 (definitely not sure) to 5 (definitely sure). All the interactions were conducted in Polish and were recorded. Results: ChatGPT-4 significantly outperformed ChatGPT-3.5, showing a notable improvement with a 29.4% margin (p < 0.001). Unlike ChatGPT-3.5, ChatGPT-4 successfully reached the passing threshold for the PES. ChatGPT-3.5 and ChatGPT-4 had the same answers in 61 questions (52.58%), both were correct in 28 questions (24.14%), and were incorrect in 33 questions (28.45%). Conclusions: ChatGPT-4 shows improved accuracy over ChatGPT-3.5, likely due to advanced algorithms and a broader training dataset, highlighting its better grasp of complex neurosurgical concepts.
Author	Rudnik, Adam Ciekalski, Marcin Błaszczyk, Bartłomiej Laskowski, Marcin Setlak, Marcin Paździora, Piotr Laskowski, Maciej
Author_xml	– sequence: 1 givenname: Maciej orcidid: 0009-0005-5809-0875 surname: Laskowski fullname: Laskowski, Maciej organization: Students’ Scientific Club, Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland – sequence: 2 givenname: Marcin orcidid: 0000-0003-1392-2007 surname: Ciekalski fullname: Ciekalski, Marcin organization: Students’ Scientific Club, Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland – sequence: 3 givenname: Marcin surname: Laskowski fullname: Laskowski, Marcin organization: Unhyped, AI Growth Partner, Kraków, Poland – sequence: 4 givenname: Bartłomiej surname: Błaszczyk fullname: Błaszczyk, Bartłomiej organization: Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland – sequence: 5 givenname: Marcin surname: Setlak fullname: Setlak, Marcin organization: Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland – sequence: 6 givenname: Piotr surname: Paździora fullname: Paździora, Piotr organization: Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland – sequence: 7 givenname: Adam surname: Rudnik fullname: Rudnik, Adam organization: Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland
BookMark	eNp9kc1KAzEURoNUUGt3PkAewGmTmWSSLqVoFUQLVnA33OanRmeSkplS6973Nm1FXJlNko_DCbnfGer54A1CF5QMqRRjNgJo2hGVpczFETqlomAZyflL78_5BA3a9o2kVUomKT1FXzMTbYgNeGVwsHjyCt10Ns-KIcfg9e-dYedx92qwdabWO7JdGeWgdm2HG6Odghq_-7CpjV4mk8cP0LngU_r0A37uA3z9Ac1O5s06hnYdlyZuz9Gxhbo1g5-9j55vrueT2-z-cXo3ubrPFGVUZBoWUoLllhDJ85wzQoBYIgQdA9GUG6BMkDQMGBdMW6J4qbVSnApZskQWfXR38OoAb9Uqugbitgrgqn0Q4rKC2DlVm6ooyUKL8SK3eRoVSS9bqTkvCyELI83OlR1ca7-C7Qbq-ldISbWvpNpVUh0qSfzlgVfp22009n_8G6f5j4Y
Cites_doi	10.17691/stm2020.12.5.12 10.1227/neu.0000000000002632 10.1016/j.wneu.2022.12.087 10.3171/2023.2.JNS23419 10.1038/d41586-023-00680-3
ContentType	Journal Article
DBID	AAYXX CITATION ADTOC UNPAY DOA
DOI	10.18794/aams/186827
DatabaseName	CrossRef Unpaywall for CDI: Periodical Content Unpaywall [Open Access] DOAJ 오픈액세스 저널 디렉토리
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Medicine
EISSN	1734-025X
EndPage	258
ExternalDocumentID	oai_doaj_org_article_360bd79b2f28480ab8f8d5563783e8ef 10.18794/aams/186827 10_18794_aams_186827
GroupedDBID	AAYXX ALMA_UNASSIGNED_HOLDINGS CITATION GROUPED_DOAJ Y2W ADTOC UNPAY
ID	FETCH-LOGICAL-c1417-dab88af5f0085225400a0f07719a0d15ea1470879a934df0c56ddcc517864a0f3
IEDL.DBID	UNPAY
ISSN	1734-025X 0208-5607
IngestDate	Fri Oct 03 12:51:15 EDT 2025 Sun Sep 07 11:22:23 EDT 2025 Tue Jul 01 04:22:21 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Language	English
License	https://creativecommons.org/licenses/by-sa/4.0 cc-by-sa
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c1417-dab88af5f0085225400a0f07719a0d15ea1470879a934df0c56ddcc517864a0f3
ORCID	0000-0003-1392-2007 0009-0005-5809-0875
OpenAccessLink	https://proxy.k.utb.cz/login?url=https://doi.org/10.18794/aams/186827
PageCount	6
ParticipantIDs	doaj_primary_oai_doaj_org_article_360bd79b2f28480ab8f8d5563783e8ef unpaywall_primary_10_18794_aams_186827 crossref_primary_10_18794_aams_186827
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2024-10-15
PublicationDateYYYYMMDD	2024-10-15
PublicationDate_xml	– month: 10 year: 2024 text: 2024-10-15 day: 15
PublicationDecade	2020
PublicationTitle	Annales Academiae Medicae Silesiensis
PublicationYear	2024
Publisher	Śląski Uniwersytet Medyczny w Katowicach
Publisher_xml	– name: Śląski Uniwersytet Medyczny w Katowicach
References	ref-1271793 ref-1271794 ref-1271795 ref-1271796 ref-1271792 ref-1271801 ref-1271797 ref-1271798 ref-1271799 ref-1271800
References_xml	– ident: ref-1271798 doi: 10.17691/stm2020.12.5.12 – ident: ref-1271799 doi: 10.1227/neu.0000000000002632 – ident: ref-1271797 doi: 10.1016/j.wneu.2022.12.087 – ident: ref-1271800 doi: 10.3171/2023.2.JNS23419 – ident: ref-1271792 – ident: ref-1271793 – ident: ref-1271796 – ident: ref-1271794 – ident: ref-1271795 – ident: ref-1271801 doi: 10.1038/d41586-023-00680-3
SSID	ssj0000684811
Score	2.2733903
Snippet	Introduction: In recent times, there has been an increased number of published materials related to artificial intelligence (AI) in both the medical field, and...
SourceID	doaj unpaywall crossref
SourceType	Open Website Open Access Repository Index Database
StartPage	253
SubjectTerms	artificial intelligence (ai) chatgpt neurosurgery
SummonAdditionalLinks	– databaseName: [Open Access] DOAJ 오픈액세스 저널 디렉토리 dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV27TsMwFLVQBx4D4inKSx6ALTRp7NgZoWqpkIo6tFK3yPFDILVpRVsBH8B_4xubkAkWxliWHZ1r-1zb1-cidKUEz42llkDGnAbEqDTgUiWBNNzwVDJLsnCgP3hK-mPyOKGTWqoviAlz8sAOuFachLliad42diHlochtGwpUrRiPNdcGVt-Qp7XNlFuDQSY-8pHu3A66lhCzZQvE4SGBTI2DSqn-HbS1Lhbi401MpzV-6e2hXe8Y4jv3Q_toQxcHaHPgr74P0efwJ8Qfzw3uPIvVw3AUxLcUi0JV3wS_FNh6dbiMTYOaS5di3toTz9y1DK5O0vC8wF4ae4p9Lnr_MhN338UMGnOSl-759BEa97qjTj_wORQCGRFLQMoCxoWhBnwrO3ftlBWhCRmLUhGqiGoRERZagEQaE2VCSROlpKQR4wmxNeNj1CjmhT5BWEcK8iZqlUtBgMYUo8aAhBkVMU11E11_o5otnFRGBlsMQD8D9DOHfhPdA-RVHRC4Lgus2TNv9uwvszfRTWWwX3s7_Y_eztB22zozwFkRPUeN1etaX1hnZJVfluPuC7i53Nw priority: 102 providerName: Directory of Open Access Journals
Title	Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgery
URI	https://doi.org/10.18794/aams/186827 https://doaj.org/article/360bd79b2f28480ab8f8d5563783e8ef
UnpaywallVersion	publishedVersion
Volume	78
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1734-025X dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000684811 issn: 0208-5607 databaseCode: DOA dateStart: 20220101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT-MwEB5BkWA58NgFbZeHfABuLkljx86RRQWEBOqBSuUUOX6I1bYpWlotcOd_M07c8jgAx0QTx_aM9Y09nm8A9oyShUNooTqRnDJnMiq1Sal20slMCwRZf6B_cZme9dh5n_fnYG-aC_M6fi_RVg6VGt4dek73tpiHhZSjx92Ahd5l9-i6Oj6JJMX2qhoqImEUIbwf7re___wN8lQE_cuwNClv1cN_NRi8QpWTVehM-1NfJvnbmoyLln58R9X4WYfXYCW4leSotoN1mLPld1i8CIHzH_DUfUkQICNHjm_U-LR7RZMWJ6o0s2dG_pQEfUJS3Wzzknd1gXq0BjKsgzpkdg5HRiUJxNoDEirZh7xO0rlXQ99YTZhZJ19vQO-kc3V8RkMFBqpjhvBlVCGlctx5zwxXPi54FblIiDhTkYm5VTETEQ5bZQkzLtI8NUZrHguZMpRMNqFRjkr7E4iNja-6aE2hFfMgaAR3zhOgcZXwzDZhf6qd_LYm2sj9BsXPae7nNK_ntAm_vepmMp4eu3qBSsjDasuTNCqMyIq2Q_SVEY7CSeOp0IRMrLSuCQczxX_4t19fFdyCb210dzyqxXwbGuN_E7uD7sq42K22-bvBZp8BMk7mIg
linkProvider	Unpaywall
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTxsxEB7RILVwgJaHCAXkA-3NYTdrr71HGoVGSKAciJSeVl4_1IpkE0EiHnf-N-NdJzwObY-7Gnttz1jfrMfzDcCxUbJwCC1UJ5JT5kxGpTYp1U46mWmBIOsP9C8u096AnQ_5cAWOF7kwr-P3Em3lRKnx7YnndG-LD7CacvS4G7A6uOyf_qqOTyJJsb-qhopIGEUIH4b77e-bv0GeiqB_HT7Ny6l6uFOj0StUOduE7mI89WWS69Z8VrT04zuqxn8N-DNsBLeSnNZ28AVWbLkFHy9C4HwbnvovCQJk4kjnt5r97F_RpMWJKs3ymZE_JUGfkFQ327zkbV2gHq2BjOugDlmew5FJSQKx9oiESvYhr5N079XYd1YTZtbJ1zswOOtedXo0VGCgOmYIX0YVUirHnffMcOfjhleRi4SIMxWZmFsVMxHhtFWWMOMizVNjtOaxkClDyWQXGuWktHtAbGx81UVrCq2YB0EjuHOeAI2rhGe2Cd8W2smnNdFG7n9Q_Jrmfk3zek2b8MOrbinj6bGrF6iEPOy2PEmjwoisaDtEXxnhLJw0ngpNyMRK65rwfan4v35t_38Fv8JaG90dj2oxP4DG7GZuD9FdmRVHwVqfAdLz5S0
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Performance+of+ChatGPT-3.5+and+ChatGPT-4+in+the+field+of+specialist+medical+knowledge+on+National+Specialization+Exam+in+neurosurgery&rft.jtitle=Annales+Academiae+Medicae+Silesiensis&rft.au=Laskowski%2C+Maciej&rft.au=Ciekalski%2C+Marcin&rft.au=Laskowski%2C+Marcin&rft.au=B%C5%82aszczyk%2C+Bart%C5%82omiej&rft.date=2024-10-15&rft.issn=1734-025X&rft.eissn=1734-025X&rft.volume=78&rft.spage=253&rft.epage=258&rft_id=info:doi/10.18794%2Faams%2F186827&rft.externalDBID=n%2Fa&rft.externalDocID=10_18794_aams_186827
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1734-025X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1734-025X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1734-025X&client=summon