Large language models in radiology reporting - A systematic review of performance, limitations, and clinical implications

Large language models (LLMs) and vision-language models (VLMs), have emerged as potential tools for automated radiology reporting. However, concerns regarding their fidelity, reliability, and clinical applicability remain. This systematic review examines the current literature on LLM-generated radio...

Full description

Saved in:
Bibliographic Details
Published inIntelligence-based medicine Vol. 12; p. 100287
Main Authors Artsi, Yaara, Klang, Eyal, Collins, Jeremy D., Glicksberg, Benjamin S., Nadkarni, Girish N., Korfiatis, Panagiotis, Sorin, Vera
Format Journal Article
LanguageEnglish
Published Elsevier B.V 2025
Subjects
Online AccessGet full text
ISSN2666-5212
2666-5212
DOI10.1016/j.ibmed.2025.100287

Cover

Abstract Large language models (LLMs) and vision-language models (VLMs), have emerged as potential tools for automated radiology reporting. However, concerns regarding their fidelity, reliability, and clinical applicability remain. This systematic review examines the current literature on LLM-generated radiology reports. Assessing their fidelity, clinical reliability, and effectiveness. The review aims to identify benefits, limitations, and key factors influencing AI-generated report quality. We conducted a systematic search of MEDLINE, Google Scholar, Scopus, and Web of Science to identify studies published between January 2015 and July 2025. Studies evaluating VLM/LLM-generated radiology reports were included (Transformer-based generative large language models). The study follows PRISMA guidelines. Risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. Fifteen studies met the inclusion criteria. Four assessed VLMs that generate full radiology reports directly from images, whereas eleven examined LLMs that summarize textual findings into radiology impressions. Six studies evaluated out-of-the-box (base) models, and nine analyzed models that had been fine-tuned. Twelve investigations paired automated natural-language metrics with radiologist review, while three relied on automated metrics. Fine-tuned models demonstrated better alignment with expert evaluations and achieved higher performance on natural language processing metrics compared to base models. All LLMs showed hallucinations, misdiagnoses, and inconsistencies. LLMs show promise in radiology reporting. However, limitations in diagnostic accuracy and hallucinations necessitate human oversight. Future research should focus on improving evaluation frameworks, incorporating diverse datasets, and prospectively validating AI-generated reports in clinical workflows. •Fine-tuned LLMs outperformed base models in NLP metrics and expert alignment.•AI-generated reports exhibited hallucinations, misdiagnoses, and missing clinical details.•Automated metrics overemphasized stylistic similarity over clinical accuracy.•Human expert evaluation remains essential for validating AI-generated radiology reports.•Future research should improve evaluation frameworks and real-world validation.
AbstractList Large language models (LLMs) and vision-language models (VLMs), have emerged as potential tools for automated radiology reporting. However, concerns regarding their fidelity, reliability, and clinical applicability remain. This systematic review examines the current literature on LLM-generated radiology reports. Assessing their fidelity, clinical reliability, and effectiveness. The review aims to identify benefits, limitations, and key factors influencing AI-generated report quality. We conducted a systematic search of MEDLINE, Google Scholar, Scopus, and Web of Science to identify studies published between January 2015 and July 2025. Studies evaluating VLM/LLM-generated radiology reports were included (Transformer-based generative large language models). The study follows PRISMA guidelines. Risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. Fifteen studies met the inclusion criteria. Four assessed VLMs that generate full radiology reports directly from images, whereas eleven examined LLMs that summarize textual findings into radiology impressions. Six studies evaluated out-of-the-box (base) models, and nine analyzed models that had been fine-tuned. Twelve investigations paired automated natural-language metrics with radiologist review, while three relied on automated metrics. Fine-tuned models demonstrated better alignment with expert evaluations and achieved higher performance on natural language processing metrics compared to base models. All LLMs showed hallucinations, misdiagnoses, and inconsistencies. LLMs show promise in radiology reporting. However, limitations in diagnostic accuracy and hallucinations necessitate human oversight. Future research should focus on improving evaluation frameworks, incorporating diverse datasets, and prospectively validating AI-generated reports in clinical workflows. •Fine-tuned LLMs outperformed base models in NLP metrics and expert alignment.•AI-generated reports exhibited hallucinations, misdiagnoses, and missing clinical details.•Automated metrics overemphasized stylistic similarity over clinical accuracy.•Human expert evaluation remains essential for validating AI-generated radiology reports.•Future research should improve evaluation frameworks and real-world validation.
AbstractRationale and ObjectivesLarge language models (LLMs) and vision-language models (VLMs), have emerged as potential tools for automated radiology reporting. However, concerns regarding their fidelity, reliability, and clinical applicability remain. This systematic review examines the current literature on LLM-generated radiology reports. Assessing their fidelity, clinical reliability, and effectiveness. The review aims to identify benefits, limitations, and key factors influencing AI-generated report quality. Materials and MethodsWe conducted a systematic search of MEDLINE, Google Scholar, Scopus, and Web of Science to identify studies published between January 2015 and July 2025. Studies evaluating VLM/LLM-generated radiology reports were included (Transformer-based generative large language models). The study follows PRISMA guidelines. Risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. ResultsFifteen studies met the inclusion criteria. Four assessed VLMs that generate full radiology reports directly from images, whereas eleven examined LLMs that summarize textual findings into radiology impressions. Six studies evaluated out-of-the-box (base) models, and nine analyzed models that had been fine-tuned. Twelve investigations paired automated natural-language metrics with radiologist review, while three relied on automated metrics. Fine-tuned models demonstrated better alignment with expert evaluations and achieved higher performance on natural language processing metrics compared to base models. All LLMs showed hallucinations, misdiagnoses, and inconsistencies. ConclusionLLMs show promise in radiology reporting. However, limitations in diagnostic accuracy and hallucinations necessitate human oversight. Future research should focus on improving evaluation frameworks, incorporating diverse datasets, and prospectively validating AI-generated reports in clinical workflows.
ArticleNumber 100287
Author Nadkarni, Girish N.
Klang, Eyal
Sorin, Vera
Collins, Jeremy D.
Korfiatis, Panagiotis
Artsi, Yaara
Glicksberg, Benjamin S.
Author_xml – sequence: 1
  givenname: Yaara
  orcidid: 0009-0008-0766-5191
  surname: Artsi
  fullname: Artsi, Yaara
  email: yaara.artsi77@gmail.com
  organization: Azrieli Faculty of Medicine, Bar-Ilan University, Zefat, Israel
– sequence: 2
  givenname: Eyal
  surname: Klang
  fullname: Klang, Eyal
  organization: The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
– sequence: 3
  givenname: Jeremy D.
  surname: Collins
  fullname: Collins, Jeremy D.
  organization: Department of Radiology, Mayo Clinic, Rochester, MN, USA
– sequence: 4
  givenname: Benjamin S.
  surname: Glicksberg
  fullname: Glicksberg, Benjamin S.
  organization: The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
– sequence: 5
  givenname: Girish N.
  surname: Nadkarni
  fullname: Nadkarni, Girish N.
  organization: The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
– sequence: 6
  givenname: Panagiotis
  surname: Korfiatis
  fullname: Korfiatis, Panagiotis
  organization: Department of Radiology, Mayo Clinic, Rochester, MN, USA
– sequence: 7
  givenname: Vera
  surname: Sorin
  fullname: Sorin, Vera
  organization: Department of Radiology, Mayo Clinic, Rochester, MN, USA
BookMark eNqFUV1LwzAUDTLBOfcLfMkPWGeSfqRFFMbwCwY-qM8hS29LZpuMpFP67003BRFkTznJvefenHPO0chYAwhdUjKnhGZXm7let1DOGWFpeCEs5ydozLIsi1JG2egXPkNT7zck9KQ0pnkyRv1KuhpwI029kwG0toTGY22wk6W2ja177GBrXadNjSO8wL73HbSy0yoUPjR8YlvhLbjKulYaBTPc6FZ3ocEaP8PSlFg12mglG6zbbRPAvnSBTivZeJh-nxP0dn_3unyMVs8PT8vFKlIxYzxinJd5LGWqZJUXKUs4oXGcJZQoBUmW02QNFS8gXEmRFzTnGS-ZSrhUkCrG4gmKD3OVs947qMTW6Va6XlAiBgPFRuwNFIOB4mBgYN0cWMGNQaUTXmkI8krtQHWitPoI__YP_8eEd-jBb-zOmaBaUOGZIOJliGhIiKUhnYIO377-f8DR9V-4YKWG
Cites_doi 10.3390/jcm13237057
10.1007/s00330-024-11107-6
10.3390/healthcare9111557
10.1007/s11604-023-01487-y
10.1016/j.ejrad.2024.111462
10.2196/60684
10.1148/radiol.240885
10.1093/bjsopen/zraa039
10.1007/s11604-020-00946-0
10.1038/s41598-024-79110-x
10.3390/bioengineering11101043
10.1016/j.acra.2024.07.020
10.1259/bjr/16360063
10.1007/978-3-031-72086-4_36
10.21037/qims-24-141
10.1186/s12911-024-02757-z
10.1177/15589447241267766
10.1007/s00234-024-03312-3
10.1259/bjr.20220972
10.2214/AJR.24.31493
10.1007/s11606-019-04838-6
10.1017/S002221511600935X
10.1038/s44401-024-00011-2
10.1016/j.imu.2024.101465
10.1148/rg.2020200020
ContentType Journal Article
Copyright 2025 The Authors
Copyright_xml – notice: 2025 The Authors
DBID 6I.
AAFTH
AAYXX
CITATION
DOI 10.1016/j.ibmed.2025.100287
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList


DeliveryMethod fulltext_linktorsrc
Discipline Medicine
EISSN 2666-5212
EndPage 100287
ExternalDocumentID 10_1016_j_ibmed_2025_100287
S2666521225000912
1_s2_0_S2666521225000912
GroupedDBID .1-
.FO
0R~
AAEDW
AALRI
AAXUO
AAYWO
ACVFH
ADCNI
ADVLN
AEUPX
AFJKZ
AFPUW
AFRHN
AIGII
AITUG
AJUYK
AKBMS
AKYEP
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
APXCP
EBS
FDB
GROUPED_DOAJ
M~E
OK1
Z5R
6I.
AAFTH
AAYXX
CITATION
ID FETCH-LOGICAL-c3227-277d83aa5caf89524701336410cce46814bef79e0cc098918767d2c47ace5c223
ISSN 2666-5212
IngestDate Wed Aug 27 16:26:52 EDT 2025
Sat Sep 20 17:15:08 EDT 2025
Sat Sep 20 19:11:06 EDT 2025
Sat Sep 20 06:21:14 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Radiology reports
Clinical evaluation
Generative AI
Large language models
AI alignment
Artificial Intelligence
Automated reporting
Natural language processing
Clinical Evaluation
Automated Reporting
Natural Language Processing
AI Alignment
Large Language Models
Language English
License This is an open access article under the CC BY license.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c3227-277d83aa5caf89524701336410cce46814bef79e0cc098918767d2c47ace5c223
ORCID 0009-0008-0766-5191
OpenAccessLink http://dx.doi.org/10.1016/j.ibmed.2025.100287
PageCount 1
ParticipantIDs crossref_primary_10_1016_j_ibmed_2025_100287
elsevier_sciencedirect_doi_10_1016_j_ibmed_2025_100287
elsevier_clinicalkeyesjournals_1_s2_0_S2666521225000912
elsevier_clinicalkey_doi_10_1016_j_ibmed_2025_100287
PublicationCentury 2000
PublicationDate 2025
2025-00-00
PublicationDateYYYYMMDD 2025-01-01
PublicationDate_xml – year: 2025
  text: 2025
PublicationDecade 2020
PublicationTitle Intelligence-based medicine
PublicationYear 2025
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Liu, Wang, Hu, Li, Yi, Chang, Gao, Yin (bib26) 2025; 39
Madan, Rai, Harley (bib4) 2003; 23
Stephan, Bertsch, Burwinkel, Vinayahalingam, Al-Nawas, Kämmerer, Thiem (bib30) 2024 Dec 23; 26
Lin (bib17) 2004 Jul 25
Ho, Tian, Ayers, Aaron, Phillips, Wolf, Mathioudakis, Dai, Klonoff (bib35) 2024 Nov 26; 24
bib11
Busch, Hoffmann, Dos Santos, Makowski, Saba, Prucker, Hadamitzky, Navab, Kather, Truhn, Cuocolo, Adams, Bressem (bib7) 2024 Oct 23
Rangan, Yin (bib41) 2024; 14
Soleimani, Seyyedi, Ayyoubzadeh, Kalhori, Keshavarz (bib31) 2024 Dec; 31
Snell, Lee, Xu, Kumar (bib44) 2024
Robinson, Powell, Waterman, Hopkins, James, Egan, Lewis (bib13) 2021 Jan 8; 5
Pellegrini, Özsoy, Busam, Navab, Keicher (bib27) 2025 Jun 18
Muennighoff, Yang, Shi (bib43) 2025
Voinea, Mămuleanu, Teică, Florescu, Selişteanu, Gheonea (bib20) 2024 Oct 18; 11
Nakaura, Yoshida, Kobayashi, Shiraishi, Nagayama, Uetani, Kidoh, Hokamura, Funama, Hirai (bib29) 2024 Feb; 42
Yu, Endo, Krishnan, Pan, Tsai, Reis, Fonseca, Lee, Abad, Ng, Langlotz, Venugopal, Rajpurkar (bib34) 2023 Aug 3; 4
Kim, Kim (bib42) 2023
Behzad, Tabatabaei, Lu, Eibschutz, Gholamrezanezhad (bib10) 2024 Oct; 223
Ma, Wu, Wang, Xu, Wei, Liu (bib22) 2024 Feb 12
Lee, Lee, Yun, Kim, Choi (bib28) 2024 Nov 22; 13
Zhu, Chen, Jin, Hou, Mathai, Mukherjee, Gao, Summers, Lu (bib32) 2024 Jun;2024
Hartung, Bickle, Gaillard, Kanne (bib36) 2020 Oct; 40
Hoffmann, Borgeaud, Mensch (bib45) 2022
Croxford, Gao, Pellegrino, Wong, Wills, First, Liao, Goswami, Patterson, Afshar (bib39) 2025; 2
Banerjee, Saenz, Wu (bib33) 2024
Sun, Qian, Wang (bib21) 2024 Sep 1; 14
Ohde (bib40) 2023; AI
Martín-Noguerol, López-Úbeda, Luna (bib9) 2024 Jun; 175
Zhang, Kishore, Wu, Weinberger, Artzi (bib18) 2019
López-Úbeda, Martín-Noguerol, Escartín, Luna (bib24) 2024 Apr; 66
Moon, Yun, Yoon, Seo, Cho, Lim, Hong (bib12) 2020 Jul; 38
Teh, Ranguis, Fagan (bib2) 2017 Jan; 131
Butler, Acosta, Kuna, Harrington, Rosenbaum, Mulligan, Kennedy (bib8) 2024 Aug 13
Markotić, Pojužina, Radančević, Miljko, Pokrajčić (bib5) 2021; 33
Li, Wang, Liu, Wang, Liu, Zhou (bib19) 2024
Grieve, Plumb, Khan (bib1) 2010 Jan; 83
Mohsin, Nasim (bib14) 2025 Mar 11
Gupta, Singh, Malhotra, Pruthi, Sharma, Garg, Yadav, Kandasamy, Batra, Rangarajan (bib37) 2025 Jan; 9
Winder, Owczarek, Chudek, Pilch-Kowalczyk, Baron (bib6) 2021 Nov 16; 9
Artsi, Sorin, Konen, Glicksberg, Nadkarni, Klang (bib38) 2024
Papineni, Roukos, Ward, Zhu (bib16) 2002
Zhang, Liu, Wang, Zhang, Xu, Pan (bib23) 2024 Sep; 312
Nishio, Matsunaga, Matsuo, Nogami, Kurata, Fujimoto (bib25) 2024; 46
Barakat, Nimri, Shokr, Mahtta, Mansoor, Masri, Elgendy (bib15) 2019 Jun; 34
Quinn, Tryposkiadis, Deeks, De Vet, Mallett, Mokkink, Takwoingi, Taylor-Phillips, Sitch (bib3) 2023 Aug; 96
Banerjee (10.1016/j.ibmed.2025.100287_bib33) 2024
Madan (10.1016/j.ibmed.2025.100287_bib4) 2003; 23
Gupta (10.1016/j.ibmed.2025.100287_bib37) 2025; 9
Rangan (10.1016/j.ibmed.2025.100287_bib41) 2024; 14
Ma (10.1016/j.ibmed.2025.100287_bib22) 2024
Barakat (10.1016/j.ibmed.2025.100287_bib15) 2019; 34
Li (10.1016/j.ibmed.2025.100287_bib19) 2024
Pellegrini (10.1016/j.ibmed.2025.100287_bib27) 2025
Robinson (10.1016/j.ibmed.2025.100287_bib13) 2021; 5
Snell (10.1016/j.ibmed.2025.100287_bib44) 2024
Moon (10.1016/j.ibmed.2025.100287_bib12) 2020; 38
Zhang (10.1016/j.ibmed.2025.100287_bib23) 2024; 312
Zhu (10.1016/j.ibmed.2025.100287_bib32) 2024
Muennighoff (10.1016/j.ibmed.2025.100287_bib43) 2025
Martín-Noguerol (10.1016/j.ibmed.2025.100287_bib9) 2024; 175
Kim (10.1016/j.ibmed.2025.100287_bib42) 2023
Winder (10.1016/j.ibmed.2025.100287_bib6) 2021; 9
Markotić (10.1016/j.ibmed.2025.100287_bib5) 2021; 33
Grieve (10.1016/j.ibmed.2025.100287_bib1) 2010; 83
Sun (10.1016/j.ibmed.2025.100287_bib21) 2024; 14
Hoffmann (10.1016/j.ibmed.2025.100287_bib45) 2022
Papineni (10.1016/j.ibmed.2025.100287_bib16) 2002
Liu (10.1016/j.ibmed.2025.100287_bib26) 2025; 39
Voinea (10.1016/j.ibmed.2025.100287_bib20) 2024; 11
Hartung (10.1016/j.ibmed.2025.100287_bib36) 2020; 40
Croxford (10.1016/j.ibmed.2025.100287_bib39) 2025; 2
López-Úbeda (10.1016/j.ibmed.2025.100287_bib24) 2024; 66
Yu (10.1016/j.ibmed.2025.100287_bib34) 2023; 4
Quinn (10.1016/j.ibmed.2025.100287_bib3) 2023; 96
Butler (10.1016/j.ibmed.2025.100287_bib8) 2024
Ho (10.1016/j.ibmed.2025.100287_bib35) 2024; 24
Nishio (10.1016/j.ibmed.2025.100287_bib25) 2024; 46
Behzad (10.1016/j.ibmed.2025.100287_bib10) 2024; 223
Artsi (10.1016/j.ibmed.2025.100287_bib38) 2024
Stephan (10.1016/j.ibmed.2025.100287_bib30) 2024; 26
Soleimani (10.1016/j.ibmed.2025.100287_bib31) 2024; 31
Ohde (10.1016/j.ibmed.2025.100287_bib40) 2023; AI
Lin (10.1016/j.ibmed.2025.100287_bib17) 2004
Mohsin (10.1016/j.ibmed.2025.100287_bib14) 2025
Zhang (10.1016/j.ibmed.2025.100287_bib18) 2019
Busch (10.1016/j.ibmed.2025.100287_bib7) 2024
Teh (10.1016/j.ibmed.2025.100287_bib2) 2017; 131
Lee (10.1016/j.ibmed.2025.100287_bib28) 2024; 13
Nakaura (10.1016/j.ibmed.2025.100287_bib29) 2024; 42
References_xml – volume: 23
  start-page: 51
  year: 2003
  end-page: 56
  ident: bib4
  article-title: Interobserver error in interpretation of the radiographs for degeneration of the lumbar spine
  publication-title: Iowa Orthop J
– year: 2019
  ident: bib18
  article-title: BERTScore: evaluating text generation with BERT
  publication-title: arXiv
– year: 2024 Feb 12
  ident: bib22
  article-title: ImpressionGPT: an iterative optimizing framework for radiology report summarization with ChatGPT
  publication-title: IEEE Trans Artif Intell
– volume: 31
  start-page: 4823
  year: 2024 Dec
  end-page: 4832
  ident: bib31
  article-title: Practical evaluation of ChatGPT performance for radiology report generation
  publication-title: Acad Radiol
– volume: 11
  start-page: 1043
  year: 2024 Oct 18
  ident: bib20
  article-title: GPT-driven radiology report generation with fine-tuned Llama 3
  publication-title: Bioengineering (Basel)
– volume: 34
  start-page: 825
  year: 2019 Jun
  end-page: 827
  ident: bib15
  article-title: Correlation of altmetric attention score and citations for high-impact general medicine journals: a cross-sectional study
  publication-title: J Gen Intern Med
– volume: 14
  start-page: 6601
  year: 2024 Sep 1
  end-page: 6612
  ident: bib21
  article-title: Preliminary experiments on interpretable ChatGPT-assisted diagnosis for breast ultrasound radiologists
  publication-title: Quant Imag Med Surg
– volume: 9
  start-page: 1557
  year: 2021 Nov 16
  ident: bib6
  article-title: Are we overdoing it? Changes in diagnostic imaging workload during the years 2010-2020 including the impact of the SARS-CoV-2 pandemic
  publication-title: Healthcare (Basel)
– volume: 42
  start-page: 190
  year: 2024 Feb
  end-page: 200
  ident: bib29
  article-title: Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports
  publication-title: Jpn J Radiol
– volume: 223
  year: 2024 Oct
  ident: bib10
  article-title: Pitfalls in interpretive applications of artificial intelligence in radiology
  publication-title: AJR Am J Roentgenol
– volume: 175
  year: 2024 Jun
  ident: bib9
  article-title: AI in radiology: legal responsibilities and the car paradox
  publication-title: Eur J Radiol
– year: 2024
  ident: bib38
  article-title: Large language models in simplifying radiological reports: systematic review
  publication-title: medRxiv
– year: 2025 Mar 11
  ident: bib14
  article-title: Explaining the unexplainable: a systematic review of explainable AI in finance
  publication-title: arXiv
– volume: 2
  start-page: 6
  year: 2025
  ident: bib39
  article-title: Current and future state of evaluation of large language models for medical summarization tasks
  publication-title: Npj Health Syst
– volume: 66
  start-page: 477
  year: 2024 Apr
  end-page: 485
  ident: bib24
  article-title: Automatic generation of conclusions from neuroradiology MRI reports through natural language processing
  publication-title: Neuroradiology
– volume: 9
  year: 2025 Jan
  ident: bib37
  article-title: Provision of radiology reports simplified with large language models to patients with cancer: impact on patient satisfaction
  publication-title: JCO Clin Cancer Inform
– year: 2022
  ident: bib45
  article-title: Training compute-optimal large language models
  publication-title: arXiv
– year: 2024 Aug 13
  ident: bib8
  article-title: Decoding radiology reports: artificial intelligence-large language models can improve the readability of hand and wrist orthopedic radiology reports
  publication-title: Hand
– volume: 14
  year: 2024
  ident: bib41
  article-title: A fine-tuning enhanced RAG system with quantized influence measure as AI judge
  publication-title: Sci Rep
– start-page: 382
  year: 2024
  end-page: 392
  ident: bib19
  article-title: KARGEN: knowledge-enhanced automated radiology report generation using large language models
  publication-title: Lecture notes in computer science
– volume: AI
  year: 2023
  ident: bib40
  article-title: The burden of reviewing LLM-Generated content
  publication-title: NEJM
– year: 2024 Oct 23
  ident: bib7
  article-title: Large language models for structured reporting in radiology: past, present, and future
  publication-title: Eur Radiol
– start-page: 185
  year: 2024
  end-page: 198
  ident: bib33
  article-title: ReXamine-Global: a framework for uncovering inconsistencies in radiology report generation metrics
  publication-title: Biocomputing
– volume: 13
  start-page: 7057
  year: 2024 Nov 22
  ident: bib28
  article-title: Comparative analysis of M4CXR, an LLM-based chest X-Ray report generation model, and ChatGPT in radiological interpretation
  publication-title: J Clin Med
– year: 2025 Jun 18
  ident: bib27
  article-title: RaDialog: large vision-language models for X-ray reporting and dialog-driven assistance
  publication-title: IEEE Trans Med Imag
– volume: 40
  start-page: 1658
  year: 2020 Oct
  end-page: 1670
  ident: bib36
  article-title: How to create a great radiology report
  publication-title: Radiographics
– volume: 33
  start-page: 768
  year: 2021
  end-page: 770
  ident: bib5
  article-title: The radiologist workload increase; where is the limit?: mini review and case study
  publication-title: Psychiatr Danub
– volume: 26
  year: 2024 Dec 23
  ident: bib30
  article-title: AI in dental radiology-improving the efficiency of reporting with ChatGPT: comparative study
  publication-title: J Med Internet Res
– volume: 4
  year: 2023 Aug 3
  ident: bib34
  article-title: Evaluating progress in automatic chest X-ray radiology report generation
  publication-title: Patterns (N Y)
– year: 2025
  ident: bib43
  article-title: s1: simple test-time scaling
  publication-title: arXiv [preprint]
– volume: 312
  year: 2024 Sep
  ident: bib23
  article-title: Constructing a large language model to generate impressions from findings in radiology reports
  publication-title: Radiology
– volume: 38
  start-page: 630
  year: 2020 Jul
  end-page: 635
  ident: bib12
  article-title: Analysis of the altmetric top 100 articles with the highest altmetric attention scores in medical imaging journals
  publication-title: Jpn J Radiol
– year: 2023
  ident: bib42
  article-title: Fine-tuning LLMs with medical data: can safety be ensured?
  publication-title: NEJM AI
– volume: 24
  start-page: 357
  year: 2024 Nov 26
  ident: bib35
  article-title: Qualitative metrics from the biomedical literature for evaluating large language models in clinical decision-making: a narrative review
  publication-title: BMC Med Inf Decis Making
– start-page: 402
  year: 2024 Jun;2024
  end-page: 411
  ident: bib32
  article-title: Leveraging professional Radiologists' Expertise to Enhance LLMs' evaluation for radiology reports
  publication-title: ArXiv
– volume: 39
  start-page: 5595
  year: 2025
  end-page: 5603
  ident: bib26
  article-title: Historical-constrained large language models for radiology report generation
  publication-title: Proc AAAI Conf Artif Intell
– volume: 131
  start-page: S47
  year: 2017 Jan
  end-page: S49
  ident: bib2
  article-title: Inter-observer variability between radiologists reporting on cerebellopontine angle tumours on magnetic resonance imaging
  publication-title: J Laryngol Otol
– volume: 96
  year: 2023 Aug
  ident: bib3
  article-title: Interobserver variability studies in diagnostic imaging: a methodological systematic review
  publication-title: Br J Radiol
– volume: 83
  start-page: 17
  year: 2010 Jan
  end-page: 22
  ident: bib1
  article-title: Radiology reporting: a general practitioner's perspective
  publication-title: Br J Radiol
– volume: 5
  year: 2021 Jan 8
  ident: bib13
  article-title: Predictive value of Altmetric score on citation rates and bibliometric impact
  publication-title: BJS Open
– ident: bib11
  article-title: PRISMA statement
– year: 2024
  ident: bib44
  article-title: Scaling LLM test-time compute optimally can be more effective than scaling model parameters
  publication-title: arXiv [preprint]
– volume: 46
  year: 2024
  ident: bib25
  article-title: Fully automatic summarization of radiology reports using natural language processing with large language models
  publication-title: Inform Med Unlocked
– start-page: 311
  year: 2002
  end-page: 318
  ident: bib16
  article-title: BLEU: a method for automatic evaluation of machine translation
  publication-title: Proceedings of the 40th annual meeting of the association for computational linguistics
– start-page: 74
  year: 2004 Jul 25
  end-page: 81
  ident: bib17
  article-title: ROUGE: a package for automatic evaluation of summaries
  publication-title: Text summarization branches out: proceedings of the ACL-04 workshop
– volume: 13
  start-page: 7057
  issue: 23
  year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib28
  article-title: Comparative analysis of M4CXR, an LLM-based chest X-Ray report generation model, and ChatGPT in radiological interpretation
  publication-title: J Clin Med
  doi: 10.3390/jcm13237057
– year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib44
  article-title: Scaling LLM test-time compute optimally can be more effective than scaling model parameters
  publication-title: arXiv [preprint]
– year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib7
  article-title: Large language models for structured reporting in radiology: past, present, and future
  publication-title: Eur Radiol
  doi: 10.1007/s00330-024-11107-6
– volume: 9
  start-page: 1557
  issue: 11
  year: 2021
  ident: 10.1016/j.ibmed.2025.100287_bib6
  article-title: Are we overdoing it? Changes in diagnostic imaging workload during the years 2010-2020 including the impact of the SARS-CoV-2 pandemic
  publication-title: Healthcare (Basel)
  doi: 10.3390/healthcare9111557
– start-page: 311
  year: 2002
  ident: 10.1016/j.ibmed.2025.100287_bib16
  article-title: BLEU: a method for automatic evaluation of machine translation
– volume: 42
  start-page: 190
  issue: 2
  year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib29
  article-title: Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports
  publication-title: Jpn J Radiol
  doi: 10.1007/s11604-023-01487-y
– volume: 175
  year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib9
  article-title: AI in radiology: legal responsibilities and the car paradox
  publication-title: Eur J Radiol
  doi: 10.1016/j.ejrad.2024.111462
– volume: 39
  start-page: 5595
  issue: 6
  year: 2025
  ident: 10.1016/j.ibmed.2025.100287_bib26
  article-title: Historical-constrained large language models for radiology report generation
  publication-title: Proc AAAI Conf Artif Intell
– year: 2019
  ident: 10.1016/j.ibmed.2025.100287_bib18
  article-title: BERTScore: evaluating text generation with BERT
  publication-title: arXiv
– volume: 26
  year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib30
  article-title: AI in dental radiology-improving the efficiency of reporting with ChatGPT: comparative study
  publication-title: J Med Internet Res
  doi: 10.2196/60684
– volume: 312
  issue: 3
  year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib23
  article-title: Constructing a large language model to generate impressions from findings in radiology reports
  publication-title: Radiology
  doi: 10.1148/radiol.240885
– volume: 5
  issue: 1
  year: 2021
  ident: 10.1016/j.ibmed.2025.100287_bib13
  article-title: Predictive value of Altmetric score on citation rates and bibliometric impact
  publication-title: BJS Open
  doi: 10.1093/bjsopen/zraa039
– year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib22
  article-title: ImpressionGPT: an iterative optimizing framework for radiology report summarization with ChatGPT
  publication-title: IEEE Trans Artif Intell
– volume: 38
  start-page: 630
  issue: 7
  year: 2020
  ident: 10.1016/j.ibmed.2025.100287_bib12
  article-title: Analysis of the altmetric top 100 articles with the highest altmetric attention scores in medical imaging journals
  publication-title: Jpn J Radiol
  doi: 10.1007/s11604-020-00946-0
– volume: 14
  issue: 1
  year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib41
  article-title: A fine-tuning enhanced RAG system with quantized influence measure as AI judge
  publication-title: Sci Rep
  doi: 10.1038/s41598-024-79110-x
– volume: 23
  start-page: 51
  year: 2003
  ident: 10.1016/j.ibmed.2025.100287_bib4
  article-title: Interobserver error in interpretation of the radiographs for degeneration of the lumbar spine
  publication-title: Iowa Orthop J
– volume: AI
  year: 2023
  ident: 10.1016/j.ibmed.2025.100287_bib40
  article-title: The burden of reviewing LLM-Generated content
  publication-title: NEJM
– volume: 11
  start-page: 1043
  issue: 10
  year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib20
  article-title: GPT-driven radiology report generation with fine-tuned Llama 3
  publication-title: Bioengineering (Basel)
  doi: 10.3390/bioengineering11101043
– volume: 31
  start-page: 4823
  issue: 12
  year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib31
  article-title: Practical evaluation of ChatGPT performance for radiology report generation
  publication-title: Acad Radiol
  doi: 10.1016/j.acra.2024.07.020
– volume: 83
  start-page: 17
  issue: 985
  year: 2010
  ident: 10.1016/j.ibmed.2025.100287_bib1
  article-title: Radiology reporting: a general practitioner's perspective
  publication-title: Br J Radiol
  doi: 10.1259/bjr/16360063
– start-page: 382
  year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib19
  article-title: KARGEN: knowledge-enhanced automated radiology report generation using large language models
  doi: 10.1007/978-3-031-72086-4_36
– volume: 14
  start-page: 6601
  issue: 9
  year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib21
  article-title: Preliminary experiments on interpretable ChatGPT-assisted diagnosis for breast ultrasound radiologists
  publication-title: Quant Imag Med Surg
  doi: 10.21037/qims-24-141
– volume: 24
  start-page: 357
  issue: 1
  year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib35
  article-title: Qualitative metrics from the biomedical literature for evaluating large language models in clinical decision-making: a narrative review
  publication-title: BMC Med Inf Decis Making
  doi: 10.1186/s12911-024-02757-z
– year: 2025
  ident: 10.1016/j.ibmed.2025.100287_bib14
  article-title: Explaining the unexplainable: a systematic review of explainable AI in finance
  publication-title: arXiv
– year: 2023
  ident: 10.1016/j.ibmed.2025.100287_bib42
  article-title: Fine-tuning LLMs with medical data: can safety be ensured?
  publication-title: NEJM AI
– year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib38
  article-title: Large language models in simplifying radiological reports: systematic review
  publication-title: medRxiv
– volume: 33
  start-page: 768
  issue: Suppl 4
  year: 2021
  ident: 10.1016/j.ibmed.2025.100287_bib5
  article-title: The radiologist workload increase; where is the limit?: mini review and case study
  publication-title: Psychiatr Danub
– year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib8
  article-title: Decoding radiology reports: artificial intelligence-large language models can improve the readability of hand and wrist orthopedic radiology reports
  publication-title: Hand
  doi: 10.1177/15589447241267766
– year: 2025
  ident: 10.1016/j.ibmed.2025.100287_bib43
  article-title: s1: simple test-time scaling
  publication-title: arXiv [preprint]
– start-page: 74
  year: 2004
  ident: 10.1016/j.ibmed.2025.100287_bib17
  article-title: ROUGE: a package for automatic evaluation of summaries
– volume: 66
  start-page: 477
  issue: 4
  year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib24
  article-title: Automatic generation of conclusions from neuroradiology MRI reports through natural language processing
  publication-title: Neuroradiology
  doi: 10.1007/s00234-024-03312-3
– year: 2025
  ident: 10.1016/j.ibmed.2025.100287_bib27
  article-title: RaDialog: large vision-language models for X-ray reporting and dialog-driven assistance
  publication-title: IEEE Trans Med Imag
– volume: 9
  year: 2025
  ident: 10.1016/j.ibmed.2025.100287_bib37
  article-title: Provision of radiology reports simplified with large language models to patients with cancer: impact on patient satisfaction
  publication-title: JCO Clin Cancer Inform
– volume: 96
  issue: 1148
  year: 2023
  ident: 10.1016/j.ibmed.2025.100287_bib3
  article-title: Interobserver variability studies in diagnostic imaging: a methodological systematic review
  publication-title: Br J Radiol
  doi: 10.1259/bjr.20220972
– volume: 223
  issue: 4
  year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib10
  article-title: Pitfalls in interpretive applications of artificial intelligence in radiology
  publication-title: AJR Am J Roentgenol
  doi: 10.2214/AJR.24.31493
– volume: 34
  start-page: 825
  issue: 6
  year: 2019
  ident: 10.1016/j.ibmed.2025.100287_bib15
  article-title: Correlation of altmetric attention score and citations for high-impact general medicine journals: a cross-sectional study
  publication-title: J Gen Intern Med
  doi: 10.1007/s11606-019-04838-6
– volume: 4
  issue: 9
  year: 2023
  ident: 10.1016/j.ibmed.2025.100287_bib34
  article-title: Evaluating progress in automatic chest X-ray radiology report generation
  publication-title: Patterns (N Y)
– volume: 131
  start-page: S47
  issue: S1
  year: 2017
  ident: 10.1016/j.ibmed.2025.100287_bib2
  article-title: Inter-observer variability between radiologists reporting on cerebellopontine angle tumours on magnetic resonance imaging
  publication-title: J Laryngol Otol
  doi: 10.1017/S002221511600935X
– volume: 2
  start-page: 6
  year: 2025
  ident: 10.1016/j.ibmed.2025.100287_bib39
  article-title: Current and future state of evaluation of large language models for medical summarization tasks
  publication-title: Npj Health Syst
  doi: 10.1038/s44401-024-00011-2
– start-page: 402
  year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib32
  article-title: Leveraging professional Radiologists' Expertise to Enhance LLMs' evaluation for radiology reports
  publication-title: ArXiv
– volume: 46
  year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib25
  article-title: Fully automatic summarization of radiology reports using natural language processing with large language models
  publication-title: Inform Med Unlocked
  doi: 10.1016/j.imu.2024.101465
– year: 2022
  ident: 10.1016/j.ibmed.2025.100287_bib45
  article-title: Training compute-optimal large language models
  publication-title: arXiv
– start-page: 185
  year: 2024
  ident: 10.1016/j.ibmed.2025.100287_bib33
  article-title: ReXamine-Global: a framework for uncovering inconsistencies in radiology report generation metrics
  publication-title: Biocomputing
– volume: 40
  start-page: 1658
  issue: 6
  year: 2020
  ident: 10.1016/j.ibmed.2025.100287_bib36
  article-title: How to create a great radiology report
  publication-title: Radiographics
  doi: 10.1148/rg.2020200020
SSID ssj0002513184
Score 2.28261
SecondaryResourceType review_article
Snippet Large language models (LLMs) and vision-language models (VLMs), have emerged as potential tools for automated radiology reporting. However, concerns regarding...
AbstractRationale and ObjectivesLarge language models (LLMs) and vision-language models (VLMs), have emerged as potential tools for automated radiology...
SourceID crossref
elsevier
SourceType Index Database
Publisher
StartPage 100287
SubjectTerms AI alignment
Artificial Intelligence
Automated reporting
Clinical evaluation
Generative AI
Informatics
Large language models
Natural language processing
Radiology reports
Title Large language models in radiology reporting - A systematic review of performance, limitations, and clinical implications
URI https://www.clinicalkey.com/#!/content/1-s2.0-S2666521225000912
https://www.clinicalkey.es/playcontent/1-s2.0-S2666521225000912
https://dx.doi.org/10.1016/j.ibmed.2025.100287
Volume 12
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELaWIiEuiKdYCpUPHNdR4nXi5LiUovJYLm2lcrIcx5F2KWmVbA_tgR_DL2VsJ17TXRXKJdpYayf2fJoZT74ZI_Q2r8AK15KSSsaSsKKKScFSTZguzVdOpritMzv_mh2esE-n6elo9CtgLV2uykhdb80r-R-pQhvI1WTJ3kGyflBogN8gX7iChOH6TzL-YmjcPuToTrWx_NZWVn1xJfdNwMQDCOiAm3Wbbc7KxTp3wCz4mUl5cnG8gdrp0ycXAf88dGs_BoU9iTGM1cZH-1m76ixz4JuUrbcFn8_6ePXBlfRcj31bKLxPNGtNvsv7yPOE4Pnfu4GU9k43S_kDJnwUheELl-Ts9Bu4BhkxqcPOFG1pGxQ0DTSsKRnrTPSG8ndxiGW0MI5EZB4Wrf_9Z6ntGybQExMHzttS2EGEGUS4Qe6h-5RnmTklY_5zHccD_xDUoiEv-HcfiltZGuHGy2x3gAKn5vgxetTvRvDMQesJGunmKXow70X3DF1ZhOEBYdghDC8a7BGGPcIwwTO8Rhh2CMPnNQ4QNsEBviYY0IUHdOEQXc_RyYeD4_1D0h_WQRTYBE4o51U-lTJVss6LlDIOm4tpxpJYKc2yPGGlrnmh4TYu8iIBK8wrqhiXSqcKnNQXaKc5b_RLhKdxWWVmX5CXksVal2yqpuZ8FqqLLK_jMZoMKyguXE0WcYvgxogNqyyGGYGBFACb27vxbd101yuATiSioyIWR0bwRu7UHi6S0DHKfM_ej3X-6d8e-epuE9tFD82dCwi-Rjur9lK_ARd5Ve7Z0NKeRepv30u94A
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large+language+models+in+radiology+reporting+-+A+systematic+review+of+performance%2C+limitations%2C+and+clinical+implications&rft.jtitle=Intelligence-based+medicine&rft.au=Artsi%2C+Yaara&rft.au=Klang%2C+Eyal&rft.au=Collins%2C+Jeremy+D.&rft.au=Glicksberg%2C+Benjamin+S.&rft.date=2025&rft.issn=2666-5212&rft.eissn=2666-5212&rft.volume=12&rft.spage=100287&rft_id=info:doi/10.1016%2Fj.ibmed.2025.100287&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_ibmed_2025_100287
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2666-5212&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2666-5212&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2666-5212&client=summon