Large language models in radiology reporting - A systematic review of performance, limitations, and clinical implications

Large language models (LLMs) and vision-language models (VLMs), have emerged as potential tools for automated radiology reporting. However, concerns regarding their fidelity, reliability, and clinical applicability remain. This systematic review examines the current literature on LLM-generated radio...

Full description

Saved in:

Bibliographic Details
Published in	Intelligence-based medicine Vol. 12; p. 100287
Main Authors	Artsi, Yaara, Klang, Eyal, Collins, Jeremy D., Glicksberg, Benjamin S., Nadkarni, Girish N., Korfiatis, Panagiotis, Sorin, Vera
Format	Journal Article
Language	English
Published	Elsevier B.V 2025
Subjects	AI alignment Artificial Intelligence Automated reporting Clinical evaluation Generative AI Informatics Large language models Natural language processing Radiology reports Radiology reports Clinical evaluation Generative AI Large language models AI alignment Artificial Intelligence Automated reporting Natural language processing Clinical Evaluation Automated Reporting Natural Language Processing AI Alignment Large Language Models
Online Access	Get full text
ISSN	2666-5212 2666-5212
DOI	10.1016/j.ibmed.2025.100287

Cover

Abstract	Large language models (LLMs) and vision-language models (VLMs), have emerged as potential tools for automated radiology reporting. However, concerns regarding their fidelity, reliability, and clinical applicability remain. This systematic review examines the current literature on LLM-generated radiology reports. Assessing their fidelity, clinical reliability, and effectiveness. The review aims to identify benefits, limitations, and key factors influencing AI-generated report quality. We conducted a systematic search of MEDLINE, Google Scholar, Scopus, and Web of Science to identify studies published between January 2015 and July 2025. Studies evaluating VLM/LLM-generated radiology reports were included (Transformer-based generative large language models). The study follows PRISMA guidelines. Risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. Fifteen studies met the inclusion criteria. Four assessed VLMs that generate full radiology reports directly from images, whereas eleven examined LLMs that summarize textual findings into radiology impressions. Six studies evaluated out-of-the-box (base) models, and nine analyzed models that had been fine-tuned. Twelve investigations paired automated natural-language metrics with radiologist review, while three relied on automated metrics. Fine-tuned models demonstrated better alignment with expert evaluations and achieved higher performance on natural language processing metrics compared to base models. All LLMs showed hallucinations, misdiagnoses, and inconsistencies. LLMs show promise in radiology reporting. However, limitations in diagnostic accuracy and hallucinations necessitate human oversight. Future research should focus on improving evaluation frameworks, incorporating diverse datasets, and prospectively validating AI-generated reports in clinical workflows. •Fine-tuned LLMs outperformed base models in NLP metrics and expert alignment.•AI-generated reports exhibited hallucinations, misdiagnoses, and missing clinical details.•Automated metrics overemphasized stylistic similarity over clinical accuracy.•Human expert evaluation remains essential for validating AI-generated radiology reports.•Future research should improve evaluation frameworks and real-world validation.
AbstractList	Large language models (LLMs) and vision-language models (VLMs), have emerged as potential tools for automated radiology reporting. However, concerns regarding their fidelity, reliability, and clinical applicability remain. This systematic review examines the current literature on LLM-generated radiology reports. Assessing their fidelity, clinical reliability, and effectiveness. The review aims to identify benefits, limitations, and key factors influencing AI-generated report quality. We conducted a systematic search of MEDLINE, Google Scholar, Scopus, and Web of Science to identify studies published between January 2015 and July 2025. Studies evaluating VLM/LLM-generated radiology reports were included (Transformer-based generative large language models). The study follows PRISMA guidelines. Risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. Fifteen studies met the inclusion criteria. Four assessed VLMs that generate full radiology reports directly from images, whereas eleven examined LLMs that summarize textual findings into radiology impressions. Six studies evaluated out-of-the-box (base) models, and nine analyzed models that had been fine-tuned. Twelve investigations paired automated natural-language metrics with radiologist review, while three relied on automated metrics. Fine-tuned models demonstrated better alignment with expert evaluations and achieved higher performance on natural language processing metrics compared to base models. All LLMs showed hallucinations, misdiagnoses, and inconsistencies. LLMs show promise in radiology reporting. However, limitations in diagnostic accuracy and hallucinations necessitate human oversight. Future research should focus on improving evaluation frameworks, incorporating diverse datasets, and prospectively validating AI-generated reports in clinical workflows. •Fine-tuned LLMs outperformed base models in NLP metrics and expert alignment.•AI-generated reports exhibited hallucinations, misdiagnoses, and missing clinical details.•Automated metrics overemphasized stylistic similarity over clinical accuracy.•Human expert evaluation remains essential for validating AI-generated radiology reports.•Future research should improve evaluation frameworks and real-world validation. AbstractRationale and ObjectivesLarge language models (LLMs) and vision-language models (VLMs), have emerged as potential tools for automated radiology reporting. However, concerns regarding their fidelity, reliability, and clinical applicability remain. This systematic review examines the current literature on LLM-generated radiology reports. Assessing their fidelity, clinical reliability, and effectiveness. The review aims to identify benefits, limitations, and key factors influencing AI-generated report quality. Materials and MethodsWe conducted a systematic search of MEDLINE, Google Scholar, Scopus, and Web of Science to identify studies published between January 2015 and July 2025. Studies evaluating VLM/LLM-generated radiology reports were included (Transformer-based generative large language models). The study follows PRISMA guidelines. Risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. ResultsFifteen studies met the inclusion criteria. Four assessed VLMs that generate full radiology reports directly from images, whereas eleven examined LLMs that summarize textual findings into radiology impressions. Six studies evaluated out-of-the-box (base) models, and nine analyzed models that had been fine-tuned. Twelve investigations paired automated natural-language metrics with radiologist review, while three relied on automated metrics. Fine-tuned models demonstrated better alignment with expert evaluations and achieved higher performance on natural language processing metrics compared to base models. All LLMs showed hallucinations, misdiagnoses, and inconsistencies. ConclusionLLMs show promise in radiology reporting. However, limitations in diagnostic accuracy and hallucinations necessitate human oversight. Future research should focus on improving evaluation frameworks, incorporating diverse datasets, and prospectively validating AI-generated reports in clinical workflows.
ArticleNumber	100287
Author	Nadkarni, Girish N. Klang, Eyal Sorin, Vera Collins, Jeremy D. Korfiatis, Panagiotis Artsi, Yaara Glicksberg, Benjamin S.
Author_xml	– sequence: 1 givenname: Yaara orcidid: 0009-0008-0766-5191 surname: Artsi fullname: Artsi, Yaara email: yaara.artsi77@gmail.com organization: Azrieli Faculty of Medicine, Bar-Ilan University, Zefat, Israel – sequence: 2 givenname: Eyal surname: Klang fullname: Klang, Eyal organization: The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA – sequence: 3 givenname: Jeremy D. surname: Collins fullname: Collins, Jeremy D. organization: Department of Radiology, Mayo Clinic, Rochester, MN, USA – sequence: 4 givenname: Benjamin S. surname: Glicksberg fullname: Glicksberg, Benjamin S. organization: The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA – sequence: 5 givenname: Girish N. surname: Nadkarni fullname: Nadkarni, Girish N. organization: The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA – sequence: 6 givenname: Panagiotis surname: Korfiatis fullname: Korfiatis, Panagiotis organization: Department of Radiology, Mayo Clinic, Rochester, MN, USA – sequence: 7 givenname: Vera surname: Sorin fullname: Sorin, Vera organization: Department of Radiology, Mayo Clinic, Rochester, MN, USA
BookMark	eNqFUV1LwzAUDTLBOfcLfMkPWGeSfqRFFMbwCwY-qM8hS29LZpuMpFP67003BRFkTznJvefenHPO0chYAwhdUjKnhGZXm7let1DOGWFpeCEs5ydozLIsi1JG2egXPkNT7zck9KQ0pnkyRv1KuhpwI029kwG0toTGY22wk6W2ja177GBrXadNjSO8wL73HbSy0yoUPjR8YlvhLbjKulYaBTPc6FZ3ocEaP8PSlFg12mglG6zbbRPAvnSBTivZeJh-nxP0dn_3unyMVs8PT8vFKlIxYzxinJd5LGWqZJUXKUs4oXGcJZQoBUmW02QNFS8gXEmRFzTnGS-ZSrhUkCrG4gmKD3OVs947qMTW6Va6XlAiBgPFRuwNFIOB4mBgYN0cWMGNQaUTXmkI8krtQHWitPoI__YP_8eEd-jBb-zOmaBaUOGZIOJliGhIiKUhnYIO377-f8DR9V-4YKWG
Cites_doi	10.3390/jcm13237057 10.1007/s00330-024-11107-6 10.3390/healthcare9111557 10.1007/s11604-023-01487-y 10.1016/j.ejrad.2024.111462 10.2196/60684 10.1148/radiol.240885 10.1093/bjsopen/zraa039 10.1007/s11604-020-00946-0 10.1038/s41598-024-79110-x 10.3390/bioengineering11101043 10.1016/j.acra.2024.07.020 10.1259/bjr/16360063 10.1007/978-3-031-72086-4_36 10.21037/qims-24-141 10.1186/s12911-024-02757-z 10.1177/15589447241267766 10.1007/s00234-024-03312-3 10.1259/bjr.20220972 10.2214/AJR.24.31493 10.1007/s11606-019-04838-6 10.1017/S002221511600935X 10.1038/s44401-024-00011-2 10.1016/j.imu.2024.101465 10.1148/rg.2020200020
ContentType	Journal Article
Copyright	2025 The Authors
Copyright_xml	– notice: 2025 The Authors
DBID	6I. AAFTH AAYXX CITATION
DOI	10.1016/j.ibmed.2025.100287
DatabaseName	ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Medicine
EISSN	2666-5212
EndPage	100287
ExternalDocumentID	10_1016_j_ibmed_2025_100287 S2666521225000912 1_s2_0_S2666521225000912
GroupedDBID	.1- .FO 0R~ AAEDW AALRI AAXUO AAYWO ACVFH ADCNI ADVLN AEUPX AFJKZ AFPUW AFRHN AIGII AITUG AJUYK AKBMS AKYEP ALMA_UNASSIGNED_HOLDINGS AMRAJ APXCP EBS FDB GROUPED_DOAJ M~E OK1 Z5R 6I. AAFTH AAYXX CITATION
ID	FETCH-LOGICAL-c3227-277d83aa5caf89524701336410cce46814bef79e0cc098918767d2c47ace5c223
ISSN	2666-5212
IngestDate	Wed Aug 27 16:26:52 EDT 2025 Sat Sep 20 17:15:08 EDT 2025 Sat Sep 20 19:11:06 EDT 2025 Sat Sep 20 06:21:14 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Keywords	Radiology reports Clinical evaluation Generative AI Large language models AI alignment Artificial Intelligence Automated reporting Natural language processing Clinical Evaluation Automated Reporting Natural Language Processing AI Alignment Large Language Models
Language	English
License	This is an open access article under the CC BY license.
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c3227-277d83aa5caf89524701336410cce46814bef79e0cc098918767d2c47ace5c223
ORCID	0009-0008-0766-5191
OpenAccessLink	http://dx.doi.org/10.1016/j.ibmed.2025.100287
PageCount	1
ParticipantIDs	crossref_primary_10_1016_j_ibmed_2025_100287 elsevier_sciencedirect_doi_10_1016_j_ibmed_2025_100287 elsevier_clinicalkeyesjournals_1_s2_0_S2666521225000912 elsevier_clinicalkey_doi_10_1016_j_ibmed_2025_100287
PublicationCentury	2000
PublicationDate	2025 2025-00-00
PublicationDateYYYYMMDD	2025-01-01
PublicationDate_xml	– year: 2025 text: 2025
PublicationDecade	2020
PublicationTitle	Intelligence-based medicine
PublicationYear	2025
Publisher	Elsevier B.V
Publisher_xml	– name: Elsevier B.V
References	Liu, Wang, Hu, Li, Yi, Chang, Gao, Yin (bib26) 2025; 39 Madan, Rai, Harley (bib4) 2003; 23 Stephan, Bertsch, Burwinkel, Vinayahalingam, Al-Nawas, Kämmerer, Thiem (bib30) 2024 Dec 23; 26 Lin (bib17) 2004 Jul 25 Ho, Tian, Ayers, Aaron, Phillips, Wolf, Mathioudakis, Dai, Klonoff (bib35) 2024 Nov 26; 24 bib11 Busch, Hoffmann, Dos Santos, Makowski, Saba, Prucker, Hadamitzky, Navab, Kather, Truhn, Cuocolo, Adams, Bressem (bib7) 2024 Oct 23 Rangan, Yin (bib41) 2024; 14 Soleimani, Seyyedi, Ayyoubzadeh, Kalhori, Keshavarz (bib31) 2024 Dec; 31 Snell, Lee, Xu, Kumar (bib44) 2024 Robinson, Powell, Waterman, Hopkins, James, Egan, Lewis (bib13) 2021 Jan 8; 5 Pellegrini, Özsoy, Busam, Navab, Keicher (bib27) 2025 Jun 18 Muennighoff, Yang, Shi (bib43) 2025 Voinea, Mămuleanu, Teică, Florescu, Selişteanu, Gheonea (bib20) 2024 Oct 18; 11 Nakaura, Yoshida, Kobayashi, Shiraishi, Nagayama, Uetani, Kidoh, Hokamura, Funama, Hirai (bib29) 2024 Feb; 42 Yu, Endo, Krishnan, Pan, Tsai, Reis, Fonseca, Lee, Abad, Ng, Langlotz, Venugopal, Rajpurkar (bib34) 2023 Aug 3; 4 Kim, Kim (bib42) 2023 Behzad, Tabatabaei, Lu, Eibschutz, Gholamrezanezhad (bib10) 2024 Oct; 223 Ma, Wu, Wang, Xu, Wei, Liu (bib22) 2024 Feb 12 Lee, Lee, Yun, Kim, Choi (bib28) 2024 Nov 22; 13 Zhu, Chen, Jin, Hou, Mathai, Mukherjee, Gao, Summers, Lu (bib32) 2024 Jun;2024 Hartung, Bickle, Gaillard, Kanne (bib36) 2020 Oct; 40 Hoffmann, Borgeaud, Mensch (bib45) 2022 Croxford, Gao, Pellegrino, Wong, Wills, First, Liao, Goswami, Patterson, Afshar (bib39) 2025; 2 Banerjee, Saenz, Wu (bib33) 2024 Sun, Qian, Wang (bib21) 2024 Sep 1; 14 Ohde (bib40) 2023; AI Martín-Noguerol, López-Úbeda, Luna (bib9) 2024 Jun; 175 Zhang, Kishore, Wu, Weinberger, Artzi (bib18) 2019 López-Úbeda, Martín-Noguerol, Escartín, Luna (bib24) 2024 Apr; 66 Moon, Yun, Yoon, Seo, Cho, Lim, Hong (bib12) 2020 Jul; 38 Teh, Ranguis, Fagan (bib2) 2017 Jan; 131 Butler, Acosta, Kuna, Harrington, Rosenbaum, Mulligan, Kennedy (bib8) 2024 Aug 13 Markotić, Pojužina, Radančević, Miljko, Pokrajčić (bib5) 2021; 33 Li, Wang, Liu, Wang, Liu, Zhou (bib19) 2024 Grieve, Plumb, Khan (bib1) 2010 Jan; 83 Mohsin, Nasim (bib14) 2025 Mar 11 Gupta, Singh, Malhotra, Pruthi, Sharma, Garg, Yadav, Kandasamy, Batra, Rangarajan (bib37) 2025 Jan; 9 Winder, Owczarek, Chudek, Pilch-Kowalczyk, Baron (bib6) 2021 Nov 16; 9 Artsi, Sorin, Konen, Glicksberg, Nadkarni, Klang (bib38) 2024 Papineni, Roukos, Ward, Zhu (bib16) 2002 Zhang, Liu, Wang, Zhang, Xu, Pan (bib23) 2024 Sep; 312 Nishio, Matsunaga, Matsuo, Nogami, Kurata, Fujimoto (bib25) 2024; 46 Barakat, Nimri, Shokr, Mahtta, Mansoor, Masri, Elgendy (bib15) 2019 Jun; 34 Quinn, Tryposkiadis, Deeks, De Vet, Mallett, Mokkink, Takwoingi, Taylor-Phillips, Sitch (bib3) 2023 Aug; 96 Banerjee (10.1016/j.ibmed.2025.100287_bib33) 2024 Madan (10.1016/j.ibmed.2025.100287_bib4) 2003; 23 Gupta (10.1016/j.ibmed.2025.100287_bib37) 2025; 9 Rangan (10.1016/j.ibmed.2025.100287_bib41) 2024; 14 Ma (10.1016/j.ibmed.2025.100287_bib22) 2024 Barakat (10.1016/j.ibmed.2025.100287_bib15) 2019; 34 Li (10.1016/j.ibmed.2025.100287_bib19) 2024 Pellegrini (10.1016/j.ibmed.2025.100287_bib27) 2025 Robinson (10.1016/j.ibmed.2025.100287_bib13) 2021; 5 Snell (10.1016/j.ibmed.2025.100287_bib44) 2024 Moon (10.1016/j.ibmed.2025.100287_bib12) 2020; 38 Zhang (10.1016/j.ibmed.2025.100287_bib23) 2024; 312 Zhu (10.1016/j.ibmed.2025.100287_bib32) 2024 Muennighoff (10.1016/j.ibmed.2025.100287_bib43) 2025 Martín-Noguerol (10.1016/j.ibmed.2025.100287_bib9) 2024; 175 Kim (10.1016/j.ibmed.2025.100287_bib42) 2023 Winder (10.1016/j.ibmed.2025.100287_bib6) 2021; 9 Markotić (10.1016/j.ibmed.2025.100287_bib5) 2021; 33 Grieve (10.1016/j.ibmed.2025.100287_bib1) 2010; 83 Sun (10.1016/j.ibmed.2025.100287_bib21) 2024; 14 Hoffmann (10.1016/j.ibmed.2025.100287_bib45) 2022 Papineni (10.1016/j.ibmed.2025.100287_bib16) 2002 Liu (10.1016/j.ibmed.2025.100287_bib26) 2025; 39 Voinea (10.1016/j.ibmed.2025.100287_bib20) 2024; 11 Hartung (10.1016/j.ibmed.2025.100287_bib36) 2020; 40 Croxford (10.1016/j.ibmed.2025.100287_bib39) 2025; 2 López-Úbeda (10.1016/j.ibmed.2025.100287_bib24) 2024; 66 Yu (10.1016/j.ibmed.2025.100287_bib34) 2023; 4 Quinn (10.1016/j.ibmed.2025.100287_bib3) 2023; 96 Butler (10.1016/j.ibmed.2025.100287_bib8) 2024 Ho (10.1016/j.ibmed.2025.100287_bib35) 2024; 24 Nishio (10.1016/j.ibmed.2025.100287_bib25) 2024; 46 Behzad (10.1016/j.ibmed.2025.100287_bib10) 2024; 223 Artsi (10.1016/j.ibmed.2025.100287_bib38) 2024 Stephan (10.1016/j.ibmed.2025.100287_bib30) 2024; 26 Soleimani (10.1016/j.ibmed.2025.100287_bib31) 2024; 31 Ohde (10.1016/j.ibmed.2025.100287_bib40) 2023; AI Lin (10.1016/j.ibmed.2025.100287_bib17) 2004 Mohsin (10.1016/j.ibmed.2025.100287_bib14) 2025 Zhang (10.1016/j.ibmed.2025.100287_bib18) 2019 Busch (10.1016/j.ibmed.2025.100287_bib7) 2024 Teh (10.1016/j.ibmed.2025.100287_bib2) 2017; 131 Lee (10.1016/j.ibmed.2025.100287_bib28) 2024; 13 Nakaura (10.1016/j.ibmed.2025.100287_bib29) 2024; 42
References_xml	– volume: 23 start-page: 51 year: 2003 end-page: 56 ident: bib4 article-title: Interobserver error in interpretation of the radiographs for degeneration of the lumbar spine publication-title: Iowa Orthop J – year: 2019 ident: bib18 article-title: BERTScore: evaluating text generation with BERT publication-title: arXiv – year: 2024 Feb 12 ident: bib22 article-title: ImpressionGPT: an iterative optimizing framework for radiology report summarization with ChatGPT publication-title: IEEE Trans Artif Intell – volume: 31 start-page: 4823 year: 2024 Dec end-page: 4832 ident: bib31 article-title: Practical evaluation of ChatGPT performance for radiology report generation publication-title: Acad Radiol – volume: 11 start-page: 1043 year: 2024 Oct 18 ident: bib20 article-title: GPT-driven radiology report generation with fine-tuned Llama 3 publication-title: Bioengineering (Basel) – volume: 34 start-page: 825 year: 2019 Jun end-page: 827 ident: bib15 article-title: Correlation of altmetric attention score and citations for high-impact general medicine journals: a cross-sectional study publication-title: J Gen Intern Med – volume: 14 start-page: 6601 year: 2024 Sep 1 end-page: 6612 ident: bib21 article-title: Preliminary experiments on interpretable ChatGPT-assisted diagnosis for breast ultrasound radiologists publication-title: Quant Imag Med Surg – volume: 9 start-page: 1557 year: 2021 Nov 16 ident: bib6 article-title: Are we overdoing it? Changes in diagnostic imaging workload during the years 2010-2020 including the impact of the SARS-CoV-2 pandemic publication-title: Healthcare (Basel) – volume: 42 start-page: 190 year: 2024 Feb end-page: 200 ident: bib29 article-title: Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports publication-title: Jpn J Radiol – volume: 223 year: 2024 Oct ident: bib10 article-title: Pitfalls in interpretive applications of artificial intelligence in radiology publication-title: AJR Am J Roentgenol – volume: 175 year: 2024 Jun ident: bib9 article-title: AI in radiology: legal responsibilities and the car paradox publication-title: Eur J Radiol – year: 2024 ident: bib38 article-title: Large language models in simplifying radiological reports: systematic review publication-title: medRxiv – year: 2025 Mar 11 ident: bib14 article-title: Explaining the unexplainable: a systematic review of explainable AI in finance publication-title: arXiv – volume: 2 start-page: 6 year: 2025 ident: bib39 article-title: Current and future state of evaluation of large language models for medical summarization tasks publication-title: Npj Health Syst – volume: 66 start-page: 477 year: 2024 Apr end-page: 485 ident: bib24 article-title: Automatic generation of conclusions from neuroradiology MRI reports through natural language processing publication-title: Neuroradiology – volume: 9 year: 2025 Jan ident: bib37 article-title: Provision of radiology reports simplified with large language models to patients with cancer: impact on patient satisfaction publication-title: JCO Clin Cancer Inform – year: 2022 ident: bib45 article-title: Training compute-optimal large language models publication-title: arXiv – year: 2024 Aug 13 ident: bib8 article-title: Decoding radiology reports: artificial intelligence-large language models can improve the readability of hand and wrist orthopedic radiology reports publication-title: Hand – volume: 14 year: 2024 ident: bib41 article-title: A fine-tuning enhanced RAG system with quantized influence measure as AI judge publication-title: Sci Rep – start-page: 382 year: 2024 end-page: 392 ident: bib19 article-title: KARGEN: knowledge-enhanced automated radiology report generation using large language models publication-title: Lecture notes in computer science – volume: AI year: 2023 ident: bib40 article-title: The burden of reviewing LLM-Generated content publication-title: NEJM – year: 2024 Oct 23 ident: bib7 article-title: Large language models for structured reporting in radiology: past, present, and future publication-title: Eur Radiol – start-page: 185 year: 2024 end-page: 198 ident: bib33 article-title: ReXamine-Global: a framework for uncovering inconsistencies in radiology report generation metrics publication-title: Biocomputing – volume: 13 start-page: 7057 year: 2024 Nov 22 ident: bib28 article-title: Comparative analysis of M4CXR, an LLM-based chest X-Ray report generation model, and ChatGPT in radiological interpretation publication-title: J Clin Med – year: 2025 Jun 18 ident: bib27 article-title: RaDialog: large vision-language models for X-ray reporting and dialog-driven assistance publication-title: IEEE Trans Med Imag – volume: 40 start-page: 1658 year: 2020 Oct end-page: 1670 ident: bib36 article-title: How to create a great radiology report publication-title: Radiographics – volume: 33 start-page: 768 year: 2021 end-page: 770 ident: bib5 article-title: The radiologist workload increase; where is the limit?: mini review and case study publication-title: Psychiatr Danub – volume: 26 year: 2024 Dec 23 ident: bib30 article-title: AI in dental radiology-improving the efficiency of reporting with ChatGPT: comparative study publication-title: J Med Internet Res – volume: 4 year: 2023 Aug 3 ident: bib34 article-title: Evaluating progress in automatic chest X-ray radiology report generation publication-title: Patterns (N Y) – year: 2025 ident: bib43 article-title: s1: simple test-time scaling publication-title: arXiv [preprint] – volume: 312 year: 2024 Sep ident: bib23 article-title: Constructing a large language model to generate impressions from findings in radiology reports publication-title: Radiology – volume: 38 start-page: 630 year: 2020 Jul end-page: 635 ident: bib12 article-title: Analysis of the altmetric top 100 articles with the highest altmetric attention scores in medical imaging journals publication-title: Jpn J Radiol – year: 2023 ident: bib42 article-title: Fine-tuning LLMs with medical data: can safety be ensured? publication-title: NEJM AI – volume: 24 start-page: 357 year: 2024 Nov 26 ident: bib35 article-title: Qualitative metrics from the biomedical literature for evaluating large language models in clinical decision-making: a narrative review publication-title: BMC Med Inf Decis Making – start-page: 402 year: 2024 Jun;2024 end-page: 411 ident: bib32 article-title: Leveraging professional Radiologists' Expertise to Enhance LLMs' evaluation for radiology reports publication-title: ArXiv – volume: 39 start-page: 5595 year: 2025 end-page: 5603 ident: bib26 article-title: Historical-constrained large language models for radiology report generation publication-title: Proc AAAI Conf Artif Intell – volume: 131 start-page: S47 year: 2017 Jan end-page: S49 ident: bib2 article-title: Inter-observer variability between radiologists reporting on cerebellopontine angle tumours on magnetic resonance imaging publication-title: J Laryngol Otol – volume: 96 year: 2023 Aug ident: bib3 article-title: Interobserver variability studies in diagnostic imaging: a methodological systematic review publication-title: Br J Radiol – volume: 83 start-page: 17 year: 2010 Jan end-page: 22 ident: bib1 article-title: Radiology reporting: a general practitioner's perspective publication-title: Br J Radiol – volume: 5 year: 2021 Jan 8 ident: bib13 article-title: Predictive value of Altmetric score on citation rates and bibliometric impact publication-title: BJS Open – ident: bib11 article-title: PRISMA statement – year: 2024 ident: bib44 article-title: Scaling LLM test-time compute optimally can be more effective than scaling model parameters publication-title: arXiv [preprint] – volume: 46 year: 2024 ident: bib25 article-title: Fully automatic summarization of radiology reports using natural language processing with large language models publication-title: Inform Med Unlocked – start-page: 311 year: 2002 end-page: 318 ident: bib16 article-title: BLEU: a method for automatic evaluation of machine translation publication-title: Proceedings of the 40th annual meeting of the association for computational linguistics – start-page: 74 year: 2004 Jul 25 end-page: 81 ident: bib17 article-title: ROUGE: a package for automatic evaluation of summaries publication-title: Text summarization branches out: proceedings of the ACL-04 workshop – volume: 13 start-page: 7057 issue: 23 year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib28 article-title: Comparative analysis of M4CXR, an LLM-based chest X-Ray report generation model, and ChatGPT in radiological interpretation publication-title: J Clin Med doi: 10.3390/jcm13237057 – year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib44 article-title: Scaling LLM test-time compute optimally can be more effective than scaling model parameters publication-title: arXiv [preprint] – year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib7 article-title: Large language models for structured reporting in radiology: past, present, and future publication-title: Eur Radiol doi: 10.1007/s00330-024-11107-6 – volume: 9 start-page: 1557 issue: 11 year: 2021 ident: 10.1016/j.ibmed.2025.100287_bib6 article-title: Are we overdoing it? Changes in diagnostic imaging workload during the years 2010-2020 including the impact of the SARS-CoV-2 pandemic publication-title: Healthcare (Basel) doi: 10.3390/healthcare9111557 – start-page: 311 year: 2002 ident: 10.1016/j.ibmed.2025.100287_bib16 article-title: BLEU: a method for automatic evaluation of machine translation – volume: 42 start-page: 190 issue: 2 year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib29 article-title: Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports publication-title: Jpn J Radiol doi: 10.1007/s11604-023-01487-y – volume: 175 year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib9 article-title: AI in radiology: legal responsibilities and the car paradox publication-title: Eur J Radiol doi: 10.1016/j.ejrad.2024.111462 – volume: 39 start-page: 5595 issue: 6 year: 2025 ident: 10.1016/j.ibmed.2025.100287_bib26 article-title: Historical-constrained large language models for radiology report generation publication-title: Proc AAAI Conf Artif Intell – year: 2019 ident: 10.1016/j.ibmed.2025.100287_bib18 article-title: BERTScore: evaluating text generation with BERT publication-title: arXiv – volume: 26 year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib30 article-title: AI in dental radiology-improving the efficiency of reporting with ChatGPT: comparative study publication-title: J Med Internet Res doi: 10.2196/60684 – volume: 312 issue: 3 year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib23 article-title: Constructing a large language model to generate impressions from findings in radiology reports publication-title: Radiology doi: 10.1148/radiol.240885 – volume: 5 issue: 1 year: 2021 ident: 10.1016/j.ibmed.2025.100287_bib13 article-title: Predictive value of Altmetric score on citation rates and bibliometric impact publication-title: BJS Open doi: 10.1093/bjsopen/zraa039 – year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib22 article-title: ImpressionGPT: an iterative optimizing framework for radiology report summarization with ChatGPT publication-title: IEEE Trans Artif Intell – volume: 38 start-page: 630 issue: 7 year: 2020 ident: 10.1016/j.ibmed.2025.100287_bib12 article-title: Analysis of the altmetric top 100 articles with the highest altmetric attention scores in medical imaging journals publication-title: Jpn J Radiol doi: 10.1007/s11604-020-00946-0 – volume: 14 issue: 1 year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib41 article-title: A fine-tuning enhanced RAG system with quantized influence measure as AI judge publication-title: Sci Rep doi: 10.1038/s41598-024-79110-x – volume: 23 start-page: 51 year: 2003 ident: 10.1016/j.ibmed.2025.100287_bib4 article-title: Interobserver error in interpretation of the radiographs for degeneration of the lumbar spine publication-title: Iowa Orthop J – volume: AI year: 2023 ident: 10.1016/j.ibmed.2025.100287_bib40 article-title: The burden of reviewing LLM-Generated content publication-title: NEJM – volume: 11 start-page: 1043 issue: 10 year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib20 article-title: GPT-driven radiology report generation with fine-tuned Llama 3 publication-title: Bioengineering (Basel) doi: 10.3390/bioengineering11101043 – volume: 31 start-page: 4823 issue: 12 year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib31 article-title: Practical evaluation of ChatGPT performance for radiology report generation publication-title: Acad Radiol doi: 10.1016/j.acra.2024.07.020 – volume: 83 start-page: 17 issue: 985 year: 2010 ident: 10.1016/j.ibmed.2025.100287_bib1 article-title: Radiology reporting: a general practitioner's perspective publication-title: Br J Radiol doi: 10.1259/bjr/16360063 – start-page: 382 year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib19 article-title: KARGEN: knowledge-enhanced automated radiology report generation using large language models doi: 10.1007/978-3-031-72086-4_36 – volume: 14 start-page: 6601 issue: 9 year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib21 article-title: Preliminary experiments on interpretable ChatGPT-assisted diagnosis for breast ultrasound radiologists publication-title: Quant Imag Med Surg doi: 10.21037/qims-24-141 – volume: 24 start-page: 357 issue: 1 year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib35 article-title: Qualitative metrics from the biomedical literature for evaluating large language models in clinical decision-making: a narrative review publication-title: BMC Med Inf Decis Making doi: 10.1186/s12911-024-02757-z – year: 2025 ident: 10.1016/j.ibmed.2025.100287_bib14 article-title: Explaining the unexplainable: a systematic review of explainable AI in finance publication-title: arXiv – year: 2023 ident: 10.1016/j.ibmed.2025.100287_bib42 article-title: Fine-tuning LLMs with medical data: can safety be ensured? publication-title: NEJM AI – year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib38 article-title: Large language models in simplifying radiological reports: systematic review publication-title: medRxiv – volume: 33 start-page: 768 issue: Suppl 4 year: 2021 ident: 10.1016/j.ibmed.2025.100287_bib5 article-title: The radiologist workload increase; where is the limit?: mini review and case study publication-title: Psychiatr Danub – year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib8 article-title: Decoding radiology reports: artificial intelligence-large language models can improve the readability of hand and wrist orthopedic radiology reports publication-title: Hand doi: 10.1177/15589447241267766 – year: 2025 ident: 10.1016/j.ibmed.2025.100287_bib43 article-title: s1: simple test-time scaling publication-title: arXiv [preprint] – start-page: 74 year: 2004 ident: 10.1016/j.ibmed.2025.100287_bib17 article-title: ROUGE: a package for automatic evaluation of summaries – volume: 66 start-page: 477 issue: 4 year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib24 article-title: Automatic generation of conclusions from neuroradiology MRI reports through natural language processing publication-title: Neuroradiology doi: 10.1007/s00234-024-03312-3 – year: 2025 ident: 10.1016/j.ibmed.2025.100287_bib27 article-title: RaDialog: large vision-language models for X-ray reporting and dialog-driven assistance publication-title: IEEE Trans Med Imag – volume: 9 year: 2025 ident: 10.1016/j.ibmed.2025.100287_bib37 article-title: Provision of radiology reports simplified with large language models to patients with cancer: impact on patient satisfaction publication-title: JCO Clin Cancer Inform – volume: 96 issue: 1148 year: 2023 ident: 10.1016/j.ibmed.2025.100287_bib3 article-title: Interobserver variability studies in diagnostic imaging: a methodological systematic review publication-title: Br J Radiol doi: 10.1259/bjr.20220972 – volume: 223 issue: 4 year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib10 article-title: Pitfalls in interpretive applications of artificial intelligence in radiology publication-title: AJR Am J Roentgenol doi: 10.2214/AJR.24.31493 – volume: 34 start-page: 825 issue: 6 year: 2019 ident: 10.1016/j.ibmed.2025.100287_bib15 article-title: Correlation of altmetric attention score and citations for high-impact general medicine journals: a cross-sectional study publication-title: J Gen Intern Med doi: 10.1007/s11606-019-04838-6 – volume: 4 issue: 9 year: 2023 ident: 10.1016/j.ibmed.2025.100287_bib34 article-title: Evaluating progress in automatic chest X-ray radiology report generation publication-title: Patterns (N Y) – volume: 131 start-page: S47 issue: S1 year: 2017 ident: 10.1016/j.ibmed.2025.100287_bib2 article-title: Inter-observer variability between radiologists reporting on cerebellopontine angle tumours on magnetic resonance imaging publication-title: J Laryngol Otol doi: 10.1017/S002221511600935X – volume: 2 start-page: 6 year: 2025 ident: 10.1016/j.ibmed.2025.100287_bib39 article-title: Current and future state of evaluation of large language models for medical summarization tasks publication-title: Npj Health Syst doi: 10.1038/s44401-024-00011-2 – start-page: 402 year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib32 article-title: Leveraging professional Radiologists' Expertise to Enhance LLMs' evaluation for radiology reports publication-title: ArXiv – volume: 46 year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib25 article-title: Fully automatic summarization of radiology reports using natural language processing with large language models publication-title: Inform Med Unlocked doi: 10.1016/j.imu.2024.101465 – year: 2022 ident: 10.1016/j.ibmed.2025.100287_bib45 article-title: Training compute-optimal large language models publication-title: arXiv – start-page: 185 year: 2024 ident: 10.1016/j.ibmed.2025.100287_bib33 article-title: ReXamine-Global: a framework for uncovering inconsistencies in radiology report generation metrics publication-title: Biocomputing – volume: 40 start-page: 1658 issue: 6 year: 2020 ident: 10.1016/j.ibmed.2025.100287_bib36 article-title: How to create a great radiology report publication-title: Radiographics doi: 10.1148/rg.2020200020
SSID	ssj0002513184
Score	2.28261
SecondaryResourceType	review_article
Snippet	Large language models (LLMs) and vision-language models (VLMs), have emerged as potential tools for automated radiology reporting. However, concerns regarding... AbstractRationale and ObjectivesLarge language models (LLMs) and vision-language models (VLMs), have emerged as potential tools for automated radiology...
SourceID	crossref elsevier
SourceType	Index Database Publisher
StartPage	100287
SubjectTerms	AI alignment Artificial Intelligence Automated reporting Clinical evaluation Generative AI Informatics Large language models Natural language processing Radiology reports
Title	Large language models in radiology reporting - A systematic review of performance, limitations, and clinical implications
URI	https://www.clinicalkey.com/#!/content/1-s2.0-S2666521225000912 https://www.clinicalkey.es/playcontent/1-s2.0-S2666521225000912 https://dx.doi.org/10.1016/j.ibmed.2025.100287
Volume	12
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELaWIiEuiKdYCpUPHNdR4nXi5LiUovJYLm2lcrIcx5F2KWmVbA_tgR_DL2VsJ17TXRXKJdpYayf2fJoZT74ZI_Q2r8AK15KSSsaSsKKKScFSTZguzVdOpritMzv_mh2esE-n6elo9CtgLV2uykhdb80r-R-pQhvI1WTJ3kGyflBogN8gX7iChOH6TzL-YmjcPuToTrWx_NZWVn1xJfdNwMQDCOiAm3Wbbc7KxTp3wCz4mUl5cnG8gdrp0ycXAf88dGs_BoU9iTGM1cZH-1m76ixz4JuUrbcFn8_6ePXBlfRcj31bKLxPNGtNvsv7yPOE4Pnfu4GU9k43S_kDJnwUheELl-Ts9Bu4BhkxqcPOFG1pGxQ0DTSsKRnrTPSG8ndxiGW0MI5EZB4Wrf_9Z6ntGybQExMHzttS2EGEGUS4Qe6h-5RnmTklY_5zHccD_xDUoiEv-HcfiltZGuHGy2x3gAKn5vgxetTvRvDMQesJGunmKXow70X3DF1ZhOEBYdghDC8a7BGGPcIwwTO8Rhh2CMPnNQ4QNsEBviYY0IUHdOEQXc_RyYeD4_1D0h_WQRTYBE4o51U-lTJVss6LlDIOm4tpxpJYKc2yPGGlrnmh4TYu8iIBK8wrqhiXSqcKnNQXaKc5b_RLhKdxWWVmX5CXksVal2yqpuZ8FqqLLK_jMZoMKyguXE0WcYvgxogNqyyGGYGBFACb27vxbd101yuATiSioyIWR0bwRu7UHi6S0DHKfM_ej3X-6d8e-epuE9tFD82dCwi-Rjur9lK_ARd5Ve7Z0NKeRepv30u94A
linkProvider	ISSN International Centre
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large+language+models+in+radiology+reporting+-+A+systematic+review+of+performance%2C+limitations%2C+and+clinical+implications&rft.jtitle=Intelligence-based+medicine&rft.au=Artsi%2C+Yaara&rft.au=Klang%2C+Eyal&rft.au=Collins%2C+Jeremy+D.&rft.au=Glicksberg%2C+Benjamin+S.&rft.date=2025&rft.issn=2666-5212&rft.eissn=2666-5212&rft.volume=12&rft.spage=100287&rft_id=info:doi/10.1016%2Fj.ibmed.2025.100287&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_ibmed_2025_100287
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2666-5212&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2666-5212&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2666-5212&client=summon