CoAtNet for Chest X-Ray Report Generation with Bi-LSTM and Multi-Head Attention

In clinical environments, Chest X-Ray (CXR) represents the most prevalent diagnostic instrument, particularly facilitating diagnostic procedures through medical report. However, manual report preparation is time-consuming, highly dependent on the expertise of radiologists, and carries the risk of er...

Full description

Saved in:
Bibliographic Details
Published inIndonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 7; no. 4; pp. 654 - 672
Main Authors Akbar, Rafy Aulia, Putra, Ricky Eka, Yustanti, Wiyli
Format Journal Article
LanguageEnglish
Published 20.10.2025
Online AccessGet full text
ISSN2656-8624
2656-8624
DOI10.35882/ijeeemi.v7i4.271

Cover

Abstract In clinical environments, Chest X-Ray (CXR) represents the most prevalent diagnostic instrument, particularly facilitating diagnostic procedures through medical report. However, manual report preparation is time-consuming, highly dependent on the expertise of radiologists, and carries the risk of errors due to high workloads and limited expert staff. Therefore, an automated system based on artificial intelligence is needed to ease the workload of radiologists while increasing consistency. This study aims to develop an automated medical report generation system with balanced data distribution, reliable encoder, and bidirectional contextual understanding. The main contributions of this study include the implementation of an undersampling strategy based on majority captions followed by oversampling on minority labels while maintaining a proportion of labels with higher frequencies, the use of Bi-LSTM with Multi Head Attention (MHA) to strengthen text context understanding, and the use of CoAtNet as a visual encoder that combines the strengths of CNN and Transformer. The methodology incorporates image preprocessing via gamma correction for contrast improvement, data selection, balancing through combined undersampling and oversampling, and CoAtNet implementation as encoder paired with Bi-LSTM and MHA as decoder. Experimental execution employed the IU X-ray dataset, with assessment conducted using BLEU and ROUGE-L metrics. Outcomes revealed that the CoAtNet configuration with Bi-LSTM and MHA, coupled with the undersampling-oversampling strategy, delivered superior performance evidenced by a cumulative score of 1.642, with BLEU-1 to BLEU-4 and ROUGE-L achieving 0.480, 0.329, 0.245, 0.183, and 0.405, respectively. These findings prove that the combination of data balancing strategies with CoAtNet and Bi-LSTM is able to produce more accurate automated medical reports and reduce bias towards the majority label.
AbstractList In clinical environments, Chest X-Ray (CXR) represents the most prevalent diagnostic instrument, particularly facilitating diagnostic procedures through medical report. However, manual report preparation is time-consuming, highly dependent on the expertise of radiologists, and carries the risk of errors due to high workloads and limited expert staff. Therefore, an automated system based on artificial intelligence is needed to ease the workload of radiologists while increasing consistency. This study aims to develop an automated medical report generation system with balanced data distribution, reliable encoder, and bidirectional contextual understanding. The main contributions of this study include the implementation of an undersampling strategy based on majority captions followed by oversampling on minority labels while maintaining a proportion of labels with higher frequencies, the use of Bi-LSTM with Multi Head Attention (MHA) to strengthen text context understanding, and the use of CoAtNet as a visual encoder that combines the strengths of CNN and Transformer. The methodology incorporates image preprocessing via gamma correction for contrast improvement, data selection, balancing through combined undersampling and oversampling, and CoAtNet implementation as encoder paired with Bi-LSTM and MHA as decoder. Experimental execution employed the IU X-ray dataset, with assessment conducted using BLEU and ROUGE-L metrics. Outcomes revealed that the CoAtNet configuration with Bi-LSTM and MHA, coupled with the undersampling-oversampling strategy, delivered superior performance evidenced by a cumulative score of 1.642, with BLEU-1 to BLEU-4 and ROUGE-L achieving 0.480, 0.329, 0.245, 0.183, and 0.405, respectively. These findings prove that the combination of data balancing strategies with CoAtNet and Bi-LSTM is able to produce more accurate automated medical reports and reduce bias towards the majority label.
Author Yustanti, Wiyli
Akbar, Rafy Aulia
Putra, Ricky Eka
Author_xml – sequence: 1
  givenname: Rafy Aulia
  orcidid: 0009-0003-6991-0694
  surname: Akbar
  fullname: Akbar, Rafy Aulia
– sequence: 2
  givenname: Ricky Eka
  orcidid: 0000-0002-5515-7967
  surname: Putra
  fullname: Putra, Ricky Eka
– sequence: 3
  givenname: Wiyli
  orcidid: 0000-0002-9574-7072
  surname: Yustanti
  fullname: Yustanti, Wiyli
BookMark eNqNkMtuwjAQRa2KSqWUD-jOP2BqO_EjSxq1UIkWibLoLnLiiTAKDnJMEX_fUFh02dXcGc25i3OPBr71gNAjo5NEaM2f3BYAdm7yrVw64YrdoCGXQhIteTr4k-_QuOu2lFKus0QrNUTLvJ3GD4i4bgPON9BF_EVW5oRXsG9DxDPwEEx0rcdHFzf42ZHF5_odG2_x-6GJjszBWDyNEfz56wHd1qbpYHydI7R-fVnnc7JYzt7y6YJUWjDCRF3qrBSlzGSZgKpqW1cUWMZZyqylqRA0S7LS0P4uLQibmKqsUgpKSdpvI8QvtQe_N6ejaZpiH9zOhFPBaPErpbhKKc5Sil5KD7ELVIW26wLU_2B-AKMpagA
ContentType Journal Article
DBID AAYXX
CITATION
ADTOC
UNPAY
DOI 10.35882/ijeeemi.v7i4.271
DatabaseName CrossRef
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
Database_xml – sequence: 1
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
EISSN 2656-8624
EndPage 672
ExternalDocumentID 10.35882/ijeeemi.v7i4.271
10_35882_ijeeemi_v7i4_271
GroupedDBID AAYXX
ALMA_UNASSIGNED_HOLDINGS
CITATION
M~E
ADTOC
UNPAY
ID FETCH-LOGICAL-c851-15fb89b5b696b3e7cfdfc0e192141dd04550939ba0dfc6de5d3acbc40e77605d3
IEDL.DBID UNPAY
ISSN 2656-8624
IngestDate Sun Oct 26 03:29:53 EDT 2025
Sat Oct 25 05:19:07 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Issue 4
Language English
License https://creativecommons.org/licenses/by-sa/4.0
cc-by-sa
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c851-15fb89b5b696b3e7cfdfc0e192141dd04550939ba0dfc6de5d3acbc40e77605d3
ORCID 0009-0003-6991-0694
0000-0002-5515-7967
0000-0002-9574-7072
OpenAccessLink https://proxy.k.utb.cz/login?url=https://www.ijeeemi.org/index.php/ijeeemi/article/download/271/241
PageCount 19
ParticipantIDs unpaywall_primary_10_35882_ijeeemi_v7i4_271
crossref_primary_10_35882_ijeeemi_v7i4_271
PublicationCentury 2000
PublicationDate 2025-10-20
PublicationDateYYYYMMDD 2025-10-20
PublicationDate_xml – month: 10
  year: 2025
  text: 2025-10-20
  day: 20
PublicationDecade 2020
PublicationTitle Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics
PublicationYear 2025
SSID ssj0002893877
Score 2.3091486
Snippet In clinical environments, Chest X-Ray (CXR) represents the most prevalent diagnostic instrument, particularly facilitating diagnostic procedures through...
SourceID unpaywall
crossref
SourceType Open Access Repository
Index Database
StartPage 654
Title CoAtNet for Chest X-Ray Report Generation with Bi-LSTM and Multi-Head Attention
URI https://www.ijeeemi.org/index.php/ijeeemi/article/download/271/241
UnpaywallVersion publishedVersion
Volume 7
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2656-8624
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0002893877
  issn: 2656-8624
  databaseCode: M~E
  dateStart: 20190101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3NS8MwFA-6HTz5gYoTHTl4UrJ-pmmPdWwMcVN0g3kq-YTp7IZ0yjz4t5s0nUwvIh4bXkv6XiC_5L3f7wFwFsdcKs4ZEpwQFJpMYRIohRRnbkCFH7EyY9ofRL1ReDXG4zUujCmrnDxKKZ9tHr_UDDRCEU416lQedYQRk59R4fjEc3zDXa9HWOPxGqiPBrfpg-kqp8EKMgwIm84MsIaTq--0XskkbPnE-7YhbS3yOV2-0el0bZfp7gC-mp8tLnlqLQrW4u8_pBv_9wO7YLsCoTC1JntgQ-b74KY9S4uBLKCGsbBt2mjBMbqjS2gxOrQC1SaO0FzewssJur4f9iHNBSxpvKin1wtMi8JWUB6AYbczbPdQ1W4BcQ27kIcVixOGWZRELJCEK6G4K41eWugJ4Rr6cxIkjLp6PBISi4ByxkNXEqLPRCI4BLV8lssjAEONObBwXeELHEZMJZQHPtFv6uc4Fl4DnK9cns2tqEamDyNlfLLKTZmJT6Zd0wAXX0H53fr4T9YnoFa8LOSpxhUFa4LN_kenWS2fT-Z71qE
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3NS8MwFA-yHTz5gYoTlRw8KVk_0jTtsQ7HEDdFN5inkk-Yzm5Ip8y_3qTpZHoR8djwWtL3Avkl7_1-D4CzJBFKC8GRFJSiyGYKU6w10oL7mMkw5lXGtD-Ie6PoekzGa1wYW1Y5eVJKvbg8fqUZaIUivHrUqz3qSSsmP2PSC2nghZa73oyJweMN0BwN7rJH21XOgBVkGRAunYmJgZOr77Tf6CRqhzT4tiFtLoo5W76z6XRtl-luA7GanysueW4vSt4WHz-kG__3AztgqwahMHMmu2BDFXvgtjPLyoEqoYGxsGPbaMExumdL6DA6dALVNo7QXt7Cywm6eRj2ISskrGi8qGfWC8zK0lVQ7oNh92rY6aG63QISBnahgGiepJzwOI05VlRoqYWvrF5aFEjpW_pzilPOfDMeS0UkZoKLyFeUmjORxAegUcwKdQhgZDAHkb4vQ0mimOuUCRxS86Z5ThIZtMD5yuX53Ilq5OYwUsUnr92U2_jkxjUtcPEVlN-tj_5kfQwa5etCnRhcUfLTeuF8AuNA1XA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=CoAtNet+for+Chest+X-Ray+Report+Generation+with+Bi-LSTM+and+Multi-Head+Attention&rft.jtitle=Indonesian+Journal+of+Electronics%2C+Electromedical+Engineering%2C+and+Medical+Informatics&rft.au=Akbar%2C+Rafy+Aulia&rft.au=Putra%2C+Ricky+Eka&rft.au=Yustanti%2C+Wiyli&rft.date=2025-10-20&rft.issn=2656-8624&rft.eissn=2656-8624&rft.volume=7&rft.issue=4&rft.spage=654&rft.epage=672&rft_id=info:doi/10.35882%2Fijeeemi.v7i4.271&rft.externalDBID=n%2Fa&rft.externalDocID=10_35882_ijeeemi_v7i4_271
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2656-8624&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2656-8624&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2656-8624&client=summon