Ascle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study

Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits fo...

Full description

Saved in:
Bibliographic Details
Published inJournal of medical Internet research Vol. 26; no. 5; p. e60601
Main Authors Yang, Rui, Zeng, Qingcheng, You, Keen, Qiao, Yujie, Huang, Lucas, Hsieh, Chia-Chun, Rosand, Benjamin, Goldwasser, Jeremy, Dave, Amisha, Keenan, Tiarnan, Ke, Yuhe, Hong, Chuan, Liu, Nan, Chew, Emily, Radev, Dragomir, Lu, Zhiyong, Xu, Hua, Chen, Qingyu, Li, Irene
Format Journal Article
LanguageEnglish
Published Canada Journal of Medical Internet Research 03.10.2024
JMIR Publications
Subjects
Online AccessGet full text
ISSN1438-8871
1439-4456
1438-8871
DOI10.2196/60601

Cover

Abstract Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings. This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics. The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5). This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face.
AbstractList Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings. This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics. The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5). This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face.
BackgroundMedical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings. ObjectiveThis study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. MethodsWe fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics. ResultsThe fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5). ConclusionsThis study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face.
Background Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings. Objective This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. Methods We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics. Results The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5). Conclusions This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face.
Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings. This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics. The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5). This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face.
Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings.BACKGROUNDMedical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings.This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases.OBJECTIVEThis study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases.We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics.METHODSWe fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics.The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5).RESULTSThe fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5).This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face.CONCLUSIONSThis study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face.
Audience Academic
Author Keenan, Tiarnan
Hsieh, Chia-Chun
Xu, Hua
Liu, Nan
Qiao, Yujie
Dave, Amisha
Huang, Lucas
Ke, Yuhe
Chew, Emily
Lu, Zhiyong
Zeng, Qingcheng
Hong, Chuan
Li, Irene
Rosand, Benjamin
You, Keen
Yang, Rui
Goldwasser, Jeremy
Chen, Qingyu
Radev, Dragomir
Author_xml – sequence: 1
  givenname: Rui
  orcidid: 0009-0006-0597-7197
  surname: Yang
  fullname: Yang, Rui
– sequence: 2
  givenname: Qingcheng
  orcidid: 0000-0002-8697-2729
  surname: Zeng
  fullname: Zeng, Qingcheng
– sequence: 3
  givenname: Keen
  orcidid: 0009-0009-9534-3041
  surname: You
  fullname: You, Keen
– sequence: 4
  givenname: Yujie
  orcidid: 0009-0009-7182-2355
  surname: Qiao
  fullname: Qiao, Yujie
– sequence: 5
  givenname: Lucas
  orcidid: 0009-0002-9600-9335
  surname: Huang
  fullname: Huang, Lucas
– sequence: 6
  givenname: Chia-Chun
  orcidid: 0009-0005-4074-8659
  surname: Hsieh
  fullname: Hsieh, Chia-Chun
– sequence: 7
  givenname: Benjamin
  orcidid: 0000-0001-8140-9438
  surname: Rosand
  fullname: Rosand, Benjamin
– sequence: 8
  givenname: Jeremy
  orcidid: 0009-0001-4263-2108
  surname: Goldwasser
  fullname: Goldwasser, Jeremy
– sequence: 9
  givenname: Amisha
  orcidid: 0000-0001-8377-8309
  surname: Dave
  fullname: Dave, Amisha
– sequence: 10
  givenname: Tiarnan
  orcidid: 0000-0002-2253-1772
  surname: Keenan
  fullname: Keenan, Tiarnan
– sequence: 11
  givenname: Yuhe
  orcidid: 0000-0001-7193-4749
  surname: Ke
  fullname: Ke, Yuhe
– sequence: 12
  givenname: Chuan
  orcidid: 0000-0001-7056-9559
  surname: Hong
  fullname: Hong, Chuan
– sequence: 13
  givenname: Nan
  orcidid: 0000-0003-3610-4883
  surname: Liu
  fullname: Liu, Nan
– sequence: 14
  givenname: Emily
  orcidid: 0000-0003-0999-9802
  surname: Chew
  fullname: Chew, Emily
– sequence: 15
  givenname: Dragomir
  orcidid: 0000-0001-7830-6489
  surname: Radev
  fullname: Radev, Dragomir
– sequence: 16
  givenname: Zhiyong
  orcidid: 0000-0001-9998-916X
  surname: Lu
  fullname: Lu, Zhiyong
– sequence: 17
  givenname: Hua
  orcidid: 0000-0002-5274-4672
  surname: Xu
  fullname: Xu, Hua
– sequence: 18
  givenname: Qingyu
  orcidid: 0000-0002-6036-1516
  surname: Chen
  fullname: Chen, Qingyu
– sequence: 19
  givenname: Irene
  orcidid: 0000-0002-1851-5390
  surname: Li
  fullname: Li, Irene
BackLink https://www.ncbi.nlm.nih.gov/pubmed/39361955$$D View this record in MEDLINE/PubMed
BookMark eNp1kttu1DAQhiNURA_0FZAlBAKhLXbiODF3q1LKSkup1L2PJs4kuGTtre0U9o6H4Al5ErybUrEVyBdjzXzza06HyZ6xBpPkmNGTlEnxVlBB2aPkgPGsnJRlwfb--u8nh95fU5pSLtmTZD-TmWAyzw8SN_Wqx18_fk7J5Tp8sYZcQBgc9GQOphugQ3LprELvtenIwtr-qw6ktY58wkaryC3weyDnaNBB0Na8I-_xFnu7WqIJBExDzm6hH7YxchWGZv00edxC7_H4zh4liw9ni9OPk_nn89npdD5RmaRsAmUDRV1mXNZY16UELpRgvEBexDCilHUGbSopFTlnmMs0r9sGWyFzwSnPjpLZKNtYuK5WTi_BrSsLuto6rOsqcEHH7iuRCdViLuoiVZy1FEA2KaZpk-aqFgWLWi9HrcGsYP0N-v5ekNFqM_9qO_8IvhrBlbM3A_pQLbVX2Pdg0A6-yhhLy7xIOY3o8xHtIJagTWuDA7XBq2nJGC1KXmyok39Q8TW41CqeQKujfyfh9U5CZELcUAeD99Xs6mKXfXZX7VAvsbnv6c95RODFCChnvXfY_rftNw84pcN25bFa3T-gfwPXI9OG
CitedBy_id crossref_primary_10_2196_59439
crossref_primary_10_1038_s44401_024_00004_1
Cites_doi 10.18653/v1/p18-4020
10.48550/ARXIV.1912.08777
10.48550/ARXIV.2308.10410
10.1002/hcs2.61
10.18653/v1/2022.bionlp-1.9
10.18653/v1/2023.emnlp-main.385
10.18653/v1/2020.acl-main.703
10.1093/jamia/ocab090
10.5260/chara.21.2.8
10.1109/ichi.2019.8904728
10.1093/nar/gkh061
10.1093/bib/bbad493
10.48550/ARXIV.2402.14293
10.48550/ARXIV.2307.09288
10.2196/48330
10.18653/v1/2022.acl-long.360
10.18653/v1/2021.naacl-main.395
10.18653/v1/2022.emnlp-main.724
10.48550/ARXIV.2401.14589
10.1145/3368555.3384469
10.48550/arXiv.2309.07852
10.18653/v1/w19-5034
10.18653/v1/2021.naacl-main.41
10.48550/ARXIV.2304.08763
10.48550/ARXIV.2406.12449
10.1038/s41746-024-01212-7
10.1109/JBHI.2017.2767063
10.1038/s41597-020-00667-z
10.48550/ARXIV.2004.04696
10.1038/s41597-019-0055-0
10.48550/ARXIV.2311.02107
10.48550/ARXIV.2106.03598
10.18653/v1/d19-1053
10.48550/arXiv.2404.18416
10.1101/2024.02.04.24302242
10.48550/arXiv.2305.09617
10.1101/2023.04.18.23288752
10.1109/aiccsa.2018.8612827
10.1016/j.jbi.2014.06.009
10.21236/ada006655
10.1007/s11606-021-07164-y
10.1016/j.cosrev.2022.100511
10.1038/s41597-019-0322-0
10.1145/3458754
10.3233/SHTI190176
10.1186/s12911-021-01459-0
10.18653/v1/n18-2097
10.18653/v1/w19-1909
10.48550/ARXIV.2303.08774
10.48550/ARXIV.2012.02420
10.1609/aaai.v38i20.30205
10.1093/bioinformatics/btz682
10.48550/ARXIV.2403.05881
10.48550/ARXIV.2007.14062
ContentType Journal Article
Copyright Rui Yang, Qingcheng Zeng, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Amisha Dave, Tiarnan Keenan, Yuhe Ke, Chuan Hong, Nan Liu, Emily Chew, Dragomir Radev, Zhiyong Lu, Hua Xu, Qingyu Chen, Irene Li. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 03.10.2024.
COPYRIGHT 2024 Journal of Medical Internet Research
Copyright_xml – notice: Rui Yang, Qingcheng Zeng, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Amisha Dave, Tiarnan Keenan, Yuhe Ke, Chuan Hong, Nan Liu, Emily Chew, Dragomir Radev, Zhiyong Lu, Hua Xu, Qingyu Chen, Irene Li. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 03.10.2024.
– notice: COPYRIGHT 2024 Journal of Medical Internet Research
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
ISN
7X8
ADTOC
UNPAY
DOA
DOI 10.2196/60601
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Gale In Context: Canada
MEDLINE - Academic
Unpaywall for CDI: Periodical Content
Unpaywall
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList


MEDLINE

MEDLINE - Academic
Database_xml – sequence: 1
  dbid: DOA
  name: Openly Available Collection - DOAJ
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 4
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
Library & Information Science
EISSN 1438-8871
ExternalDocumentID oai_doaj_org_article_636cfe56b72c41f0aa9d2e22d25cb671
10.2196/60601
A811078470
39361955
10_2196_60601
Genre Evaluation Study
Journal Article
GeographicLocations Singapore
United Kingdom
GeographicLocations_xml – name: Singapore
– name: United Kingdom
GrantInformation_xml – fundername: NCATS NIH HHS
  grantid: UL1 TR001863
– fundername: NLM NIH HHS
  grantid: K99 LM014024
GroupedDBID ---
.4I
.DC
29L
2WC
36B
53G
5GY
5VS
77I
77K
7RV
7X7
8FI
8FJ
AAFWJ
AAKPC
AAWTL
AAYXX
ABDBF
ABIVO
ABUWG
ACGFO
ADBBV
AEGXH
AENEX
AFKRA
AFPKN
AIAGR
ALMA_UNASSIGNED_HOLDINGS
ALSLI
AOIJS
BAWUL
BCNDV
BENPR
CCPQU
CITATION
CNYFK
CS3
DIK
DU5
DWQXO
E3Z
EAP
EBD
EBS
EJD
ELW
EMB
EMOBN
ESX
F5P
FRP
FYUFA
GROUPED_DOAJ
GX1
HMCUK
HYE
IAO
ICO
IEA
IHR
INH
ISN
ITC
KQ8
M1O
M48
NAPCQ
OK1
OVT
P2P
PGMZT
PHGZM
PHGZT
PIMPY
PPXIY
PQQKQ
PRQQA
PUEGO
RNS
RPM
SJN
SV3
TR2
UKHRP
XSB
CGR
CUY
CVF
ECM
EIF
NPM
7X8
ADRAZ
ADTOC
C1A
O5R
O5S
UNPAY
WOQ
ID FETCH-LOGICAL-c3901-a8da7b8349bebb89a46c6147e47901ee99b3af29006541e5925bfdef69564043
IEDL.DBID M48
ISSN 1438-8871
1439-4456
IngestDate Tue Oct 14 19:09:46 EDT 2025
Sun Oct 26 04:00:30 EDT 2025
Thu Oct 02 05:20:02 EDT 2025
Mon Oct 20 22:48:50 EDT 2025
Mon Oct 20 16:59:16 EDT 2025
Thu Oct 16 15:51:00 EDT 2025
Thu Oct 09 01:30:52 EDT 2025
Wed Oct 01 02:47:54 EDT 2025
Thu Apr 24 22:58:47 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 5
Keywords deep learning
natural language processing
large language models
retrieval-augmented generation
machine learning
generative artificial intelligence
healthcare
Language English
License Rui Yang, Qingcheng Zeng, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Amisha Dave, Tiarnan Keenan, Yuhe Ke, Chuan Hong, Nan Liu, Emily Chew, Dragomir Radev, Zhiyong Lu, Hua Xu, Qingyu Chen, Irene Li. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 03.10.2024.
cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c3901-a8da7b8349bebb89a46c6147e47901ee99b3af29006541e5925bfdef69564043
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ObjectType-Undefined-3
ORCID 0009-0002-9600-9335
0000-0002-5274-4672
0009-0005-4074-8659
0000-0002-1851-5390
0009-0009-9534-3041
0000-0001-7056-9559
0000-0003-0999-9802
0000-0002-2253-1772
0000-0001-7830-6489
0000-0001-9998-916X
0000-0002-8697-2729
0000-0001-7193-4749
0000-0002-6036-1516
0009-0006-0597-7197
0000-0001-8140-9438
0000-0001-8377-8309
0000-0003-3610-4883
0009-0009-7182-2355
0009-0001-4263-2108
OpenAccessLink https://doaj.org/article/636cfe56b72c41f0aa9d2e22d25cb671
PMID 39361955
PQID 3112857240
PQPubID 23479
ParticipantIDs doaj_primary_oai_doaj_org_article_636cfe56b72c41f0aa9d2e22d25cb671
unpaywall_primary_10_2196_60601
proquest_miscellaneous_3112857240
gale_infotracmisc_A811078470
gale_infotracacademiconefile_A811078470
gale_incontextgauss_ISN_A811078470
pubmed_primary_39361955
crossref_primary_10_2196_60601
crossref_citationtrail_10_2196_60601
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-Oct-03
PublicationDateYYYYMMDD 2024-10-03
PublicationDate_xml – month: 10
  year: 2024
  text: 2024-Oct-03
  day: 03
PublicationDecade 2020
PublicationPlace Canada
PublicationPlace_xml – name: Canada
PublicationTitle Journal of medical Internet research
PublicationTitleAlternate J Med Internet Res
PublicationYear 2024
Publisher Journal of Medical Internet Research
JMIR Publications
Publisher_xml – name: Journal of Medical Internet Research
– name: JMIR Publications
References ref13
ref57
ref12
ref56
ref15
ref59
ref14
ref58
ref53
ref52
ref11
ref55
ref10
ref54
Yang, R (ref62) 2024; 7
ref17
ref16
ref19
ref51
ref50
ref46
ref45
ref48
ref47
ref42
ref41
ref44
ref43
ref49
ref8
ref7
Eyre, H (ref18) 2022; 2021
ref9
ref4
ref3
ref6
ref5
ref40
ref35
ref34
ref37
ref36
ref31
ref30
ref33
ref32
ref2
ref1
ref39
ref38
ref24
ref23
ref26
ref25
ref20
ref22
ref21
ref28
ref27
ref29
ref60
ref61
41031083 - ArXiv. 2023 Dec 9:arXiv:2311.16588v2.
References_xml – ident: ref45
  doi: 10.18653/v1/p18-4020
– ident: ref32
  doi: 10.48550/ARXIV.1912.08777
– ident: ref57
  doi: 10.48550/ARXIV.2308.10410
– ident: ref13
  doi: 10.1002/hcs2.61
– ident: ref37
  doi: 10.18653/v1/2022.bionlp-1.9
– ident: ref23
  doi: 10.18653/v1/2023.emnlp-main.385
– ident: ref34
  doi: 10.18653/v1/2020.acl-main.703
– ident: ref20
  doi: 10.1093/jamia/ocab090
– ident: ref49
  doi: 10.5260/chara.21.2.8
– ident: ref6
  doi: 10.1109/ichi.2019.8904728
– ident: ref22
  doi: 10.1093/nar/gkh061
– ident: ref60
  doi: 10.1093/bib/bbad493
– ident: ref58
  doi: 10.48550/ARXIV.2402.14293
– ident: ref25
  doi: 10.48550/ARXIV.2307.09288
– ident: ref30
  doi: 10.2196/48330
– ident: ref35
  doi: 10.18653/v1/2022.acl-long.360
– ident: ref41
  doi: 10.18653/v1/2021.naacl-main.395
– ident: ref42
  doi: 10.18653/v1/2022.emnlp-main.724
– ident: ref59
  doi: 10.48550/ARXIV.2401.14589
– ident: ref16
  doi: 10.1145/3368555.3384469
– ident: ref27
  doi: 10.48550/arXiv.2309.07852
– ident: ref17
  doi: 10.18653/v1/w19-5034
– ident: ref26
– ident: ref46
  doi: 10.18653/v1/2021.naacl-main.41
– ident: ref31
  doi: 10.48550/ARXIV.2304.08763
– ident: ref47
– ident: ref15
  doi: 10.48550/ARXIV.2406.12449
– volume: 7
  start-page: 209
  year: 2024
  ident: ref62
  publication-title: NPJ Digit Med
  doi: 10.1038/s41746-024-01212-7
– ident: ref3
  doi: 10.1109/JBHI.2017.2767063
– ident: ref40
  doi: 10.1038/s41597-020-00667-z
– ident: ref54
– ident: ref51
  doi: 10.48550/ARXIV.2004.04696
– ident: ref5
  doi: 10.1038/s41597-019-0055-0
– ident: ref61
  doi: 10.48550/ARXIV.2311.02107
– ident: ref36
  doi: 10.48550/ARXIV.2106.03598
– ident: ref50
  doi: 10.18653/v1/d19-1053
– ident: ref12
  doi: 10.48550/arXiv.2404.18416
– ident: ref14
  doi: 10.1101/2024.02.04.24302242
– ident: ref11
  doi: 10.48550/arXiv.2305.09617
– ident: ref48
– ident: ref55
  doi: 10.1101/2023.04.18.23288752
– ident: ref4
  doi: 10.1109/aiccsa.2018.8612827
– ident: ref29
  doi: 10.1016/j.jbi.2014.06.009
– ident: ref53
  doi: 10.21236/ada006655
– ident: ref44
  doi: 10.1007/s11606-021-07164-y
– ident: ref2
  doi: 10.1016/j.cosrev.2022.100511
– ident: ref39
  doi: 10.1038/s41597-019-0322-0
– ident: ref10
  doi: 10.1145/3458754
– ident: ref1
  doi: 10.5260/chara.21.2.8
– ident: ref28
  doi: 10.3233/SHTI190176
– ident: ref19
  doi: 10.1186/s12911-021-01459-0
– ident: ref38
  doi: 10.18653/v1/n18-2097
– ident: ref9
  doi: 10.18653/v1/w19-1909
– ident: ref24
  doi: 10.48550/ARXIV.2303.08774
– ident: ref43
  doi: 10.48550/ARXIV.2012.02420
– ident: ref52
  doi: 10.5260/chara.21.2.8
– ident: ref56
  doi: 10.1609/aaai.v38i20.30205
– ident: ref8
  doi: 10.1093/bioinformatics/btz682
– volume: 2021
  start-page: 438
  year: 2022
  ident: ref18
  publication-title: AMIA Annu Symp Proc
– ident: ref7
  doi: 10.5260/chara.21.2.8
– ident: ref21
  doi: 10.48550/ARXIV.2403.05881
– ident: ref33
  doi: 10.48550/ARXIV.2007.14062
– reference: 41031083 - ArXiv. 2023 Dec 9:arXiv:2311.16588v2.
SSID ssj0020491
Score 2.436251
Snippet Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address...
Background Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To...
BackgroundMedical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To...
SourceID doaj
unpaywall
proquest
gale
pubmed
crossref
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage e60601
SubjectTerms Algorithms
Analysis
Automation
Computational linguistics
Humans
Language processing
Mechanization
Medical care
Medical literature
Medical research
Medicine, Experimental
Natural language interfaces
Natural Language Processing
Python (Programming language)
Quality management
Rankings
Software
Technology application
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3NbtQwEB6hHgoSQlD-Am0xqIJT1E0cOzG3BbUqiK6QWKTeLP-lQqySqrsrtDcegifkSZhJvNEuIHHhGk8Se2bsmbFnPgMcKRO4CtykXHEMUErvUiOMSxWGAs4rKTJH9c7nE3n2uXh_IS42rvqinLAeHrhn3LHk0tVBSFvmrsjqkTHK5yHPfS6clV31eD6q1DqYiqEW-r3ZLtymRGdUsWNJqCNblqcD6P9zGd6wQzeXzZVZfTOz2YbBOb0Ld6KnyMZ9D-_BjdDswUGsM2AvWSwkIsayOEP3YPc8npXfh-vxHN_7-f3HmH1cEUIAm5gOZIN9iHuULFYJoPVi07adff2yYPhNFg9v2BQXbtbjUtNfXrONDCNmGs9OBqhwRvmIqwcwPT2Zvj1L4w0LqaO9jtRU3pS24oWywdpKmUI6tNdlKEpsDkEpy02dq64ENQtC5cLWPtQSoyqC5XkIO03bhMfArOO2rEVlVOYLboRVtc1UqL1DSh5kAkdr5msX0cfpEoyZxiiEZKQ7GSVwOJBd9XAbvxO8IckNjYSO3T1AndFRZ_S_dCaBFyR3TfgXDSXYXJrlfK7ffZrocUUBMZrsUQKvIlHdYk-difUKOF6CzNqi3N-ixAnqtpqfr9VLUxNltTWhXc41R2e3EiU6VQk86vVuGBjOHYxthUjg2aCIf2fJk__BkqdwK0d3rUtT5Puws7hehgN0txb2sJtZvwCAJChb
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1Lb9QwEB6VrVQqIR7lFegWgyp6Smni2Im5BdSqILqqxFYqp8h2HFR1lVTNRmg58SP4hfwSxok32hQQtygeO37MZGbsmc8Au0IaKgyVPhUUHZQ4175kUvsCXQGdC84CbfOdTyb8-Cz6eM7O12C8zIVZOb9HWeJvuMULuQXrnKGpPYL1s8lp-qXNGEJBRQEJumfhR2gJbMCdQb2Brmkh-f_88a5onttNeSUX3-RstqJiju51u4F1i0xoI0su95u52tffb-A2_rP39-GuMy5J2nHDA1gz5RaMXWoCeU1c7pFdC-KEegs2Ttzx-kO4Tmus9-vHz5ScLiyoAJnIFpeDfHLbmsQlFqDCI9Oqml1ezAm2Sdx5D5niv550UNb2K2_JSlASkWVODnt0cWJDGBePYHp0OH1_7LtLGXxtt0d8meQyVgmNhDJKJUJGXKOKj00UY7ExQigqi1C0WauBYSJkqshNwdERs0g-j2FUVqV5CkRpquKCJVIEeUQlU6JQgTBFrpGSGu7B7nL1Mu0Ay-29GbMMHRc7vVk7vR7s9GRXHULHTYJ3dun7Qguo3b7AlcqcfGaccl0YxlUc6igoDqQUeWjCMA-ZVjzGRl5ZxsksZEZpY3K-yqausw-fJ1maWB8atfyBB3uOqKiwp1q6FAccr0XZGlBuDyhRpvWg-OWSPzNbZAPhSlM1dUbRPk5YjHaYB086xu0HhuKG7jBjHrzoOfnvU_LsvxTPYTNE860NW6TbMJpfN2aM5tdc7Tgh_A08SSmL
  priority: 102
  providerName: Unpaywall
Title Ascle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study
URI https://www.ncbi.nlm.nih.gov/pubmed/39361955
https://www.proquest.com/docview/3112857240
https://doi.org/10.2196/60601
https://doaj.org/article/636cfe56b72c41f0aa9d2e22d25cb671
UnpaywallVersion publishedVersion
Volume 26
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: KQ8
  dateStart: 19990101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVAON
  databaseName: Openly Available Collection - DOAJ
  customDbUrl:
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: DOA
  dateStart: 19990101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVEBS
  databaseName: EBSCOhost Academic Search Ultimate
  customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: ABDBF
  dateStart: 20050101
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn
  providerName: EBSCOhost
– providerCode: PRVBFR
  databaseName: Free Medical Journals
  customDbUrl:
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: DIK
  dateStart: 19990101
  isFulltext: true
  titleUrlDefault: http://www.freemedicaljournals.com
  providerName: Flying Publisher
– providerCode: PRVFQY
  databaseName: GFMER Free Medical Journals
  customDbUrl:
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: GX1
  dateStart: 19990101
  isFulltext: true
  titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php
  providerName: Geneva Foundation for Medical Education and Research
– providerCode: PRVAQN
  databaseName: PubMed Central
  customDbUrl:
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: RPM
  dateStart: 19990101
  isFulltext: true
  titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/
  providerName: National Library of Medicine
– providerCode: PRVPQU
  databaseName: Health & Medical Collection
  customDbUrl:
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: 7X7
  dateStart: 20010101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/healthcomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Library Science Database
  customDbUrl:
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: M1O
  dateStart: 20010101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/libraryscience
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl: http://www.proquest.com/pqcentral?accountid=15518
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: BENPR
  dateStart: 20010101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVFZP
  databaseName: Scholars Portal Journals: Open Access
  customDbUrl:
  eissn: 1438-8871
  dateEnd: 20250131
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: M48
  dateStart: 20100201
  isFulltext: true
  titleUrlDefault: http://journals.scholarsportal.info
  providerName: Scholars Portal
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwED_tQxpICMH4KmzFoGk8BZY4TmIkhDrUaSBaKmil8hTZjjMhoqTrh6D_PXepG7WwB176EF_cxHeXu7PvfgdwIpXl0nLlcckxQIkz4ymhjCcxFDCZjIRvqN65148uR-GnsRjvwLpc0S3g7MbQjvpJjabF69_Xy_eo8O8ojRkF6E1EmCKnk2uPeknRmatrrLEL-2ivJDV06IXN2UKAPnEdhoWo6qhi_gHc2Zppy0LVQP7_fq437NWtRTlRy1-qKDYM08U9uOs8StZZicB92LHlIRy7egR2ylzBETGAOU0-hIOeO1N_AJPODO_zOmywJBwB1lc1FAf77HYymaslQBvHhlVV_PwxZzgjc0c8bIifd7ZCr6b_eMs28pCYKjPWbQDFGWUtLh_C8KI7_HDpuT4MnqEdEU8lmYp1wkOprdaJVGFk0KrHNoxx2FopNVd5IOtCVd8KGQidZzaPMPYi8J5HsFdWpX0CTBuu41wkSvpZyJXQMte-tHlmkJLbqAUn66VPjcMop1YZRYqxCnEorTnUgnZDNlmBcvxNcE58awYJQ7u-UE2vUqeSacQjk1sR6TgwoZ-fKSWzwAZBFgijoxgneUlcTwklo6Q0nCu1mM3Sj9_6aSehsBkN-1kLXjmivMInNcpVNeD7ErDWFuXRFiWqsdkafrEWrpSGKPettNVilnJ0iRMRoyi34PFK6poXQw3DCFiIFjxvxPDmJXn6H_M_g9sB-mx1riI_gr35dGGP0eea6zbsxuO4Dfvn3f7ga7veucDfnv-lXSsXjoz6g873P8CFL3s
linkProvider Scholars Portal
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1Lb9QwEB6VrVQqIR7lFegWgyp6Smni2Im5BdSqILqqxFYqp8h2HFR1lVTNRmg58SP4hfwSxok32hQQtygeO37MZGbsmc8Au0IaKgyVPhUUHZQ4175kUvsCXQGdC84CbfOdTyb8-Cz6eM7O12C8zIVZOb9HWeJvuMULuQXrnKGpPYL1s8lp-qXNGEJBRQEJumfhR2gJbMCdQb2Brmkh-f_88a5onttNeSUX3-RstqJiju51u4F1i0xoI0su95u52tffb-A2_rP39-GuMy5J2nHDA1gz5RaMXWoCeU1c7pFdC-KEegs2Ttzx-kO4Tmus9-vHz5ScLiyoAJnIFpeDfHLbmsQlFqDCI9Oqml1ezAm2Sdx5D5niv550UNb2K2_JSlASkWVODnt0cWJDGBePYHp0OH1_7LtLGXxtt0d8meQyVgmNhDJKJUJGXKOKj00UY7ExQigqi1C0WauBYSJkqshNwdERs0g-j2FUVqV5CkRpquKCJVIEeUQlU6JQgTBFrpGSGu7B7nL1Mu0Ay-29GbMMHRc7vVk7vR7s9GRXHULHTYJ3dun7Qguo3b7AlcqcfGaccl0YxlUc6igoDqQUeWjCMA-ZVjzGRl5ZxsksZEZpY3K-yqausw-fJ1maWB8atfyBB3uOqKiwp1q6FAccr0XZGlBuDyhRpvWg-OWSPzNbZAPhSlM1dUbRPk5YjHaYB086xu0HhuKG7jBjHrzoOfnvU_LsvxTPYTNE860NW6TbMJpfN2aM5tdc7Tgh_A08SSmL
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ascle-A+Python+Natural+Language+Processing+Toolkit+for+Medical+Text+Generation%3A+Development+and+Evaluation+Study&rft.jtitle=Journal+of+medical+Internet+research&rft.au=Yang%2C+Rui&rft.au=Zeng%2C+Qingcheng&rft.au=You%2C+Keen&rft.au=Qiao%2C+Yujie&rft.date=2024-10-03&rft.issn=1438-8871&rft.eissn=1438-8871&rft.volume=26&rft.spage=e60601&rft_id=info:doi/10.2196%2F60601&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1438-8871&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1438-8871&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1438-8871&client=summon