Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgery

Introduction: In recent times, there has been an increased number of published materials related to artificial intelligence (AI) in both the medical field, and specifically, in the domain of neurosurgery. Studies integrating AI into neurosurgical practice suggest an ongoing shift towards a greater d...

Full description

Saved in:
Bibliographic Details
Published inAnnales Academiae Medicae Silesiensis Vol. 78; pp. 253 - 258
Main Authors Laskowski, Maciej, Ciekalski, Marcin, Laskowski, Marcin, Błaszczyk, Bartłomiej, Setlak, Marcin, Paździora, Piotr, Rudnik, Adam
Format Journal Article
LanguageEnglish
Published Śląski Uniwersytet Medyczny w Katowicach 15.10.2024
Subjects
Online AccessGet full text
ISSN1734-025X
0208-5607
1734-025X
DOI10.18794/aams/186827

Cover

Abstract Introduction: In recent times, there has been an increased number of published materials related to artificial intelligence (AI) in both the medical field, and specifically, in the domain of neurosurgery. Studies integrating AI into neurosurgical practice suggest an ongoing shift towards a greater dependence on AI-assisted tools for diagnostics, image analysis, and decision-making. Material and methods: The study evaluated the performance of ChatGPT-3.5 and ChatGPT-4 on a neurosurgery exam from Autumn 2017, which was the latest exam with officially provided answers on the Medical Examinations Center in Łódź, Poland (Centrum Egzaminów Medycznych – CEM) website. The passing score for the National Specialization Exam (Państwowy Egzamin Specjalizacyjny – PES) in Poland, as administered by CEM, is 56% of the valid questions. This exam, chosen from CEM, comprised 116 single-choice questions after eliminating four outdated questions. These questions were categorized into ten thematic groups based on the subjects they address. For data collection, both ChatGPT versions were briefed on the exam rules and asked to rate their confidence in each answer on a scale from 1 (definitely not sure) to 5 (definitely sure). All the interactions were conducted in Polish and were recorded. Results: ChatGPT-4 significantly outperformed ChatGPT-3.5, showing a notable improvement with a 29.4% margin (p < 0.001). Unlike ChatGPT-3.5, ChatGPT-4 successfully reached the passing threshold for the PES. ChatGPT-3.5 and ChatGPT-4 had the same answers in 61 questions (52.58%), both were correct in 28 questions (24.14%), and were incorrect in 33 questions (28.45%). Conclusions: ChatGPT-4 shows improved accuracy over ChatGPT-3.5, likely due to advanced algorithms and a broader training dataset, highlighting its better grasp of complex neurosurgical concepts.
AbstractList Introduction: In recent times, there has been an increased number of published materials related to artificial intelligence (AI) in both the medical field, and specifically, in the domain of neurosurgery. Studies integrating AI into neurosurgical practice suggest an ongoing shift towards a greater dependence on AI-assisted tools for diagnostics, image analysis, and decision-making. Material and methods: The study evaluated the performance of ChatGPT-3.5 and ChatGPT-4 on a neurosurgery exam from Autumn 2017, which was the latest exam with officially provided answers on the Medical Examinations Center in Łódź, Poland (Centrum Egzaminów Medycznych – CEM) website. The passing score for the National Specialization Exam (Państwowy Egzamin Specjalizacyjny – PES) in Poland, as administered by CEM, is 56% of the valid questions. This exam, chosen from CEM, comprised 116 single-choice questions after eliminating four outdated questions. These questions were categorized into ten thematic groups based on the subjects they address. For data collection, both ChatGPT versions were briefed on the exam rules and asked to rate their confidence in each answer on a scale from 1 (definitely not sure) to 5 (definitely sure). All the interactions were conducted in Polish and were recorded. Results: ChatGPT-4 significantly outperformed ChatGPT-3.5, showing a notable improvement with a 29.4% margin (p < 0.001). Unlike ChatGPT-3.5, ChatGPT-4 successfully reached the passing threshold for the PES. ChatGPT-3.5 and ChatGPT-4 had the same answers in 61 questions (52.58%), both were correct in 28 questions (24.14%), and were incorrect in 33 questions (28.45%). Conclusions: ChatGPT-4 shows improved accuracy over ChatGPT-3.5, likely due to advanced algorithms and a broader training dataset, highlighting its better grasp of complex neurosurgical concepts.
Author Rudnik, Adam
Ciekalski, Marcin
Błaszczyk, Bartłomiej
Laskowski, Marcin
Setlak, Marcin
Paździora, Piotr
Laskowski, Maciej
Author_xml – sequence: 1
  givenname: Maciej
  orcidid: 0009-0005-5809-0875
  surname: Laskowski
  fullname: Laskowski, Maciej
  organization: Students’ Scientific Club, Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland
– sequence: 2
  givenname: Marcin
  orcidid: 0000-0003-1392-2007
  surname: Ciekalski
  fullname: Ciekalski, Marcin
  organization: Students’ Scientific Club, Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland
– sequence: 3
  givenname: Marcin
  surname: Laskowski
  fullname: Laskowski, Marcin
  organization: Unhyped, AI Growth Partner, Kraków, Poland
– sequence: 4
  givenname: Bartłomiej
  surname: Błaszczyk
  fullname: Błaszczyk, Bartłomiej
  organization: Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland
– sequence: 5
  givenname: Marcin
  surname: Setlak
  fullname: Setlak, Marcin
  organization: Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland
– sequence: 6
  givenname: Piotr
  surname: Paździora
  fullname: Paździora, Piotr
  organization: Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland
– sequence: 7
  givenname: Adam
  surname: Rudnik
  fullname: Rudnik, Adam
  organization: Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland
BookMark eNp9kc1KAzEURoNUUGt3PkAewGmTmWSSLqVoFUQLVnA33OanRmeSkplS6973Nm1FXJlNko_DCbnfGer54A1CF5QMqRRjNgJo2hGVpczFETqlomAZyflL78_5BA3a9o2kVUomKT1FXzMTbYgNeGVwsHjyCt10Ns-KIcfg9e-dYedx92qwdabWO7JdGeWgdm2HG6Odghq_-7CpjV4mk8cP0LngU_r0A37uA3z9Ac1O5s06hnYdlyZuz9Gxhbo1g5-9j55vrueT2-z-cXo3ubrPFGVUZBoWUoLllhDJ85wzQoBYIgQdA9GUG6BMkDQMGBdMW6J4qbVSnApZskQWfXR38OoAb9Uqugbitgrgqn0Q4rKC2DlVm6ooyUKL8SK3eRoVSS9bqTkvCyELI83OlR1ca7-C7Qbq-ldISbWvpNpVUh0qSfzlgVfp22009n_8G6f5j4Y
Cites_doi 10.17691/stm2020.12.5.12
10.1227/neu.0000000000002632
10.1016/j.wneu.2022.12.087
10.3171/2023.2.JNS23419
10.1038/d41586-023-00680-3
ContentType Journal Article
DBID AAYXX
CITATION
ADTOC
UNPAY
DOA
DOI 10.18794/aams/186827
DatabaseName CrossRef
Unpaywall for CDI: Periodical Content
Unpaywall
[Open Access] DOAJ 오픈액세스 저널 디렉토리
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
EISSN 1734-025X
EndPage 258
ExternalDocumentID oai_doaj_org_article_360bd79b2f28480ab8f8d5563783e8ef
10.18794/aams/186827
10_18794_aams_186827
GroupedDBID AAYXX
ALMA_UNASSIGNED_HOLDINGS
CITATION
GROUPED_DOAJ
Y2W
ADTOC
UNPAY
ID FETCH-LOGICAL-c1417-dab88af5f0085225400a0f07719a0d15ea1470879a934df0c56ddcc517864a0f3
IEDL.DBID UNPAY
ISSN 1734-025X
0208-5607
IngestDate Fri Oct 03 12:51:15 EDT 2025
Sun Sep 07 11:22:23 EDT 2025
Tue Jul 01 04:22:21 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License https://creativecommons.org/licenses/by-sa/4.0
cc-by-sa
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1417-dab88af5f0085225400a0f07719a0d15ea1470879a934df0c56ddcc517864a0f3
ORCID 0000-0003-1392-2007
0009-0005-5809-0875
OpenAccessLink https://proxy.k.utb.cz/login?url=https://doi.org/10.18794/aams/186827
PageCount 6
ParticipantIDs doaj_primary_oai_doaj_org_article_360bd79b2f28480ab8f8d5563783e8ef
unpaywall_primary_10_18794_aams_186827
crossref_primary_10_18794_aams_186827
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-10-15
PublicationDateYYYYMMDD 2024-10-15
PublicationDate_xml – month: 10
  year: 2024
  text: 2024-10-15
  day: 15
PublicationDecade 2020
PublicationTitle Annales Academiae Medicae Silesiensis
PublicationYear 2024
Publisher Śląski Uniwersytet Medyczny w Katowicach
Publisher_xml – name: Śląski Uniwersytet Medyczny w Katowicach
References ref-1271793
ref-1271794
ref-1271795
ref-1271796
ref-1271792
ref-1271801
ref-1271797
ref-1271798
ref-1271799
ref-1271800
References_xml – ident: ref-1271798
  doi: 10.17691/stm2020.12.5.12
– ident: ref-1271799
  doi: 10.1227/neu.0000000000002632
– ident: ref-1271797
  doi: 10.1016/j.wneu.2022.12.087
– ident: ref-1271800
  doi: 10.3171/2023.2.JNS23419
– ident: ref-1271792
– ident: ref-1271793
– ident: ref-1271796
– ident: ref-1271794
– ident: ref-1271795
– ident: ref-1271801
  doi: 10.1038/d41586-023-00680-3
SSID ssj0000684811
Score 2.2733903
Snippet Introduction: In recent times, there has been an increased number of published materials related to artificial intelligence (AI) in both the medical field, and...
SourceID doaj
unpaywall
crossref
SourceType Open Website
Open Access Repository
Index Database
StartPage 253
SubjectTerms artificial intelligence (ai)
chatgpt
neurosurgery
SummonAdditionalLinks – databaseName: [Open Access] DOAJ 오픈액세스 저널 디렉토리
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV27TsMwFLVQBx4D4inKSx6ALTRp7NgZoWqpkIo6tFK3yPFDILVpRVsBH8B_4xubkAkWxliWHZ1r-1zb1-cidKUEz42llkDGnAbEqDTgUiWBNNzwVDJLsnCgP3hK-mPyOKGTWqoviAlz8sAOuFachLliad42diHlochtGwpUrRiPNdcGVt-Qp7XNlFuDQSY-8pHu3A66lhCzZQvE4SGBTI2DSqn-HbS1Lhbi401MpzV-6e2hXe8Y4jv3Q_toQxcHaHPgr74P0efwJ8Qfzw3uPIvVw3AUxLcUi0JV3wS_FNh6dbiMTYOaS5di3toTz9y1DK5O0vC8wF4ae4p9Lnr_MhN338UMGnOSl-759BEa97qjTj_wORQCGRFLQMoCxoWhBnwrO3ftlBWhCRmLUhGqiGoRERZagEQaE2VCSROlpKQR4wmxNeNj1CjmhT5BWEcK8iZqlUtBgMYUo8aAhBkVMU11E11_o5otnFRGBlsMQD8D9DOHfhPdA-RVHRC4Lgus2TNv9uwvszfRTWWwX3s7_Y_eztB22zozwFkRPUeN1etaX1hnZJVfluPuC7i53Nw
  priority: 102
  providerName: Directory of Open Access Journals
Title Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgery
URI https://doi.org/10.18794/aams/186827
https://doaj.org/article/360bd79b2f28480ab8f8d5563783e8ef
UnpaywallVersion publishedVersion
Volume 78
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1734-025X
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000684811
  issn: 0208-5607
  databaseCode: DOA
  dateStart: 20220101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT-MwEB5BkWA58NgFbZeHfABuLkljx86RRQWEBOqBSuUUOX6I1bYpWlotcOd_M07c8jgAx0QTx_aM9Y09nm8A9oyShUNooTqRnDJnMiq1Sal20slMCwRZf6B_cZme9dh5n_fnYG-aC_M6fi_RVg6VGt4dek73tpiHhZSjx92Ahd5l9-i6Oj6JJMX2qhoqImEUIbwf7re___wN8lQE_cuwNClv1cN_NRi8QpWTVehM-1NfJvnbmoyLln58R9X4WYfXYCW4leSotoN1mLPld1i8CIHzH_DUfUkQICNHjm_U-LR7RZMWJ6o0s2dG_pQEfUJS3Wzzknd1gXq0BjKsgzpkdg5HRiUJxNoDEirZh7xO0rlXQ99YTZhZJ19vQO-kc3V8RkMFBqpjhvBlVCGlctx5zwxXPi54FblIiDhTkYm5VTETEQ5bZQkzLtI8NUZrHguZMpRMNqFRjkr7E4iNja-6aE2hFfMgaAR3zhOgcZXwzDZhf6qd_LYm2sj9BsXPae7nNK_ntAm_vepmMp4eu3qBSsjDasuTNCqMyIq2Q_SVEY7CSeOp0IRMrLSuCQczxX_4t19fFdyCb210dzyqxXwbGuN_E7uD7sq42K22-bvBZp8BMk7mIg
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTxsxEB7RILVwgJaHCAXkA-3NYTdrr71HGoVGSKAciJSeVl4_1IpkE0EiHnf-N-NdJzwObY-7Gnttz1jfrMfzDcCxUbJwCC1UJ5JT5kxGpTYp1U46mWmBIOsP9C8u096AnQ_5cAWOF7kwr-P3Em3lRKnx7YnndG-LD7CacvS4G7A6uOyf_qqOTyJJsb-qhopIGEUIH4b77e-bv0GeiqB_HT7Ny6l6uFOj0StUOduE7mI89WWS69Z8VrT04zuqxn8N-DNsBLeSnNZ28AVWbLkFHy9C4HwbnvovCQJk4kjnt5r97F_RpMWJKs3ymZE_JUGfkFQ327zkbV2gHq2BjOugDlmew5FJSQKx9oiESvYhr5N079XYd1YTZtbJ1zswOOtedXo0VGCgOmYIX0YVUirHnffMcOfjhleRi4SIMxWZmFsVMxHhtFWWMOMizVNjtOaxkClDyWQXGuWktHtAbGx81UVrCq2YB0EjuHOeAI2rhGe2Cd8W2smnNdFG7n9Q_Jrmfk3zek2b8MOrbinj6bGrF6iEPOy2PEmjwoisaDtEXxnhLJw0ngpNyMRK65rwfan4v35t_38Fv8JaG90dj2oxP4DG7GZuD9FdmRVHwVqfAdLz5S0
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Performance+of+ChatGPT-3.5+and+ChatGPT-4+in+the+field+of+specialist+medical+knowledge+on+National+Specialization+Exam+in+neurosurgery&rft.jtitle=Annales+Academiae+Medicae+Silesiensis&rft.au=Laskowski%2C+Maciej&rft.au=Ciekalski%2C+Marcin&rft.au=Laskowski%2C+Marcin&rft.au=B%C5%82aszczyk%2C+Bart%C5%82omiej&rft.date=2024-10-15&rft.issn=1734-025X&rft.eissn=1734-025X&rft.volume=78&rft.spage=253&rft.epage=258&rft_id=info:doi/10.18794%2Faams%2F186827&rft.externalDBID=n%2Fa&rft.externalDocID=10_18794_aams_186827
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1734-025X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1734-025X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1734-025X&client=summon