Ascle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study

Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits fo...

Full description

Saved in:

Bibliographic Details
Published in	Journal of medical Internet research Vol. 26; no. 5; p. e60601
Main Authors	Yang, Rui, Zeng, Qingcheng, You, Keen, Qiao, Yujie, Huang, Lucas, Hsieh, Chia-Chun, Rosand, Benjamin, Goldwasser, Jeremy, Dave, Amisha, Keenan, Tiarnan, Ke, Yuhe, Hong, Chuan, Liu, Nan, Chew, Emily, Radev, Dragomir, Lu, Zhiyong, Xu, Hua, Chen, Qingyu, Li, Irene
Format	Journal Article
Language	English
Published	Canada Journal of Medical Internet Research 03.10.2024 JMIR Publications
Subjects	Algorithms Analysis Automation Computational linguistics Humans Language processing Mechanization Medical care Medical literature Medical research Medicine, Experimental Natural language interfaces Natural Language Processing Python (Programming language) Quality management Rankings Software Technology application Singapore United Kingdom deep learning natural language processing large language models retrieval-augmented generation machine learning generative artificial intelligence healthcare
Online Access	Get full text
ISSN	1438-8871 1439-4456 1438-8871
DOI	10.2196/60601

Cover

Abstract	Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings. This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics. The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5). This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face.
AbstractList	Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings. This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics. The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5). This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face. BackgroundMedical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings. ObjectiveThis study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. MethodsWe fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics. ResultsThe fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5). ConclusionsThis study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face. Background Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings. Objective This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. Methods We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics. Results The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5). Conclusions This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face. Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings. This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics. The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5). This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face. Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings.BACKGROUNDMedical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings.This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases.OBJECTIVEThis study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases.We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics.METHODSWe fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics.The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5).RESULTSThe fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5).This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face.CONCLUSIONSThis study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face.
Audience	Academic
Author	Keenan, Tiarnan Hsieh, Chia-Chun Xu, Hua Liu, Nan Qiao, Yujie Dave, Amisha Huang, Lucas Ke, Yuhe Chew, Emily Lu, Zhiyong Zeng, Qingcheng Hong, Chuan Li, Irene Rosand, Benjamin You, Keen Yang, Rui Goldwasser, Jeremy Chen, Qingyu Radev, Dragomir
Author_xml	– sequence: 1 givenname: Rui orcidid: 0009-0006-0597-7197 surname: Yang fullname: Yang, Rui – sequence: 2 givenname: Qingcheng orcidid: 0000-0002-8697-2729 surname: Zeng fullname: Zeng, Qingcheng – sequence: 3 givenname: Keen orcidid: 0009-0009-9534-3041 surname: You fullname: You, Keen – sequence: 4 givenname: Yujie orcidid: 0009-0009-7182-2355 surname: Qiao fullname: Qiao, Yujie – sequence: 5 givenname: Lucas orcidid: 0009-0002-9600-9335 surname: Huang fullname: Huang, Lucas – sequence: 6 givenname: Chia-Chun orcidid: 0009-0005-4074-8659 surname: Hsieh fullname: Hsieh, Chia-Chun – sequence: 7 givenname: Benjamin orcidid: 0000-0001-8140-9438 surname: Rosand fullname: Rosand, Benjamin – sequence: 8 givenname: Jeremy orcidid: 0009-0001-4263-2108 surname: Goldwasser fullname: Goldwasser, Jeremy – sequence: 9 givenname: Amisha orcidid: 0000-0001-8377-8309 surname: Dave fullname: Dave, Amisha – sequence: 10 givenname: Tiarnan orcidid: 0000-0002-2253-1772 surname: Keenan fullname: Keenan, Tiarnan – sequence: 11 givenname: Yuhe orcidid: 0000-0001-7193-4749 surname: Ke fullname: Ke, Yuhe – sequence: 12 givenname: Chuan orcidid: 0000-0001-7056-9559 surname: Hong fullname: Hong, Chuan – sequence: 13 givenname: Nan orcidid: 0000-0003-3610-4883 surname: Liu fullname: Liu, Nan – sequence: 14 givenname: Emily orcidid: 0000-0003-0999-9802 surname: Chew fullname: Chew, Emily – sequence: 15 givenname: Dragomir orcidid: 0000-0001-7830-6489 surname: Radev fullname: Radev, Dragomir – sequence: 16 givenname: Zhiyong orcidid: 0000-0001-9998-916X surname: Lu fullname: Lu, Zhiyong – sequence: 17 givenname: Hua orcidid: 0000-0002-5274-4672 surname: Xu fullname: Xu, Hua – sequence: 18 givenname: Qingyu orcidid: 0000-0002-6036-1516 surname: Chen fullname: Chen, Qingyu – sequence: 19 givenname: Irene orcidid: 0000-0002-1851-5390 surname: Li fullname: Li, Irene
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/39361955$$D View this record in MEDLINE/PubMed
BookMark	eNp1kttu1DAQhiNURA_0FZAlBAKhLXbiODF3q1LKSkup1L2PJs4kuGTtre0U9o6H4Al5ErybUrEVyBdjzXzza06HyZ6xBpPkmNGTlEnxVlBB2aPkgPGsnJRlwfb--u8nh95fU5pSLtmTZD-TmWAyzw8SN_Wqx18_fk7J5Tp8sYZcQBgc9GQOphugQ3LprELvtenIwtr-qw6ktY58wkaryC3weyDnaNBB0Na8I-_xFnu7WqIJBExDzm6hH7YxchWGZv00edxC7_H4zh4liw9ni9OPk_nn89npdD5RmaRsAmUDRV1mXNZY16UELpRgvEBexDCilHUGbSopFTlnmMs0r9sGWyFzwSnPjpLZKNtYuK5WTi_BrSsLuto6rOsqcEHH7iuRCdViLuoiVZy1FEA2KaZpk-aqFgWLWi9HrcGsYP0N-v5ekNFqM_9qO_8IvhrBlbM3A_pQLbVX2Pdg0A6-yhhLy7xIOY3o8xHtIJagTWuDA7XBq2nJGC1KXmyok39Q8TW41CqeQKujfyfh9U5CZELcUAeD99Xs6mKXfXZX7VAvsbnv6c95RODFCChnvXfY_rftNw84pcN25bFa3T-gfwPXI9OG
CitedBy_id	crossref_primary_10_2196_59439 crossref_primary_10_1038_s44401_024_00004_1
Cites_doi	10.18653/v1/p18-4020 10.48550/ARXIV.1912.08777 10.48550/ARXIV.2308.10410 10.1002/hcs2.61 10.18653/v1/2022.bionlp-1.9 10.18653/v1/2023.emnlp-main.385 10.18653/v1/2020.acl-main.703 10.1093/jamia/ocab090 10.5260/chara.21.2.8 10.1109/ichi.2019.8904728 10.1093/nar/gkh061 10.1093/bib/bbad493 10.48550/ARXIV.2402.14293 10.48550/ARXIV.2307.09288 10.2196/48330 10.18653/v1/2022.acl-long.360 10.18653/v1/2021.naacl-main.395 10.18653/v1/2022.emnlp-main.724 10.48550/ARXIV.2401.14589 10.1145/3368555.3384469 10.48550/arXiv.2309.07852 10.18653/v1/w19-5034 10.18653/v1/2021.naacl-main.41 10.48550/ARXIV.2304.08763 10.48550/ARXIV.2406.12449 10.1038/s41746-024-01212-7 10.1109/JBHI.2017.2767063 10.1038/s41597-020-00667-z 10.48550/ARXIV.2004.04696 10.1038/s41597-019-0055-0 10.48550/ARXIV.2311.02107 10.48550/ARXIV.2106.03598 10.18653/v1/d19-1053 10.48550/arXiv.2404.18416 10.1101/2024.02.04.24302242 10.48550/arXiv.2305.09617 10.1101/2023.04.18.23288752 10.1109/aiccsa.2018.8612827 10.1016/j.jbi.2014.06.009 10.21236/ada006655 10.1007/s11606-021-07164-y 10.1016/j.cosrev.2022.100511 10.1038/s41597-019-0322-0 10.1145/3458754 10.3233/SHTI190176 10.1186/s12911-021-01459-0 10.18653/v1/n18-2097 10.18653/v1/w19-1909 10.48550/ARXIV.2303.08774 10.48550/ARXIV.2012.02420 10.1609/aaai.v38i20.30205 10.1093/bioinformatics/btz682 10.48550/ARXIV.2403.05881 10.48550/ARXIV.2007.14062
ContentType	Journal Article
Copyright	Rui Yang, Qingcheng Zeng, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Amisha Dave, Tiarnan Keenan, Yuhe Ke, Chuan Hong, Nan Liu, Emily Chew, Dragomir Radev, Zhiyong Lu, Hua Xu, Qingyu Chen, Irene Li. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 03.10.2024. COPYRIGHT 2024 Journal of Medical Internet Research
Copyright_xml	– notice: Rui Yang, Qingcheng Zeng, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Amisha Dave, Tiarnan Keenan, Yuhe Ke, Chuan Hong, Nan Liu, Emily Chew, Dragomir Radev, Zhiyong Lu, Hua Xu, Qingyu Chen, Irene Li. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 03.10.2024. – notice: COPYRIGHT 2024 Journal of Medical Internet Research
DBID	AAYXX CITATION CGR CUY CVF ECM EIF NPM ISN 7X8 ADTOC UNPAY DOA
DOI	10.2196/60601
DatabaseName	CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Gale In Context: Canada MEDLINE - Academic Unpaywall for CDI: Periodical Content Unpaywall DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic
DatabaseTitleList	MEDLINE MEDLINE - Academic
Database_xml	– sequence: 1 dbid: DOA name: Openly Available Collection - DOAJ url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 4 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Medicine Library & Information Science
EISSN	1438-8871
ExternalDocumentID	oai_doaj_org_article_636cfe56b72c41f0aa9d2e22d25cb671 10.2196/60601 A811078470 39361955 10_2196_60601
Genre	Evaluation Study Journal Article
GeographicLocations	Singapore United Kingdom
GeographicLocations_xml	– name: Singapore – name: United Kingdom
GrantInformation_xml	– fundername: NCATS NIH HHS grantid: UL1 TR001863 – fundername: NLM NIH HHS grantid: K99 LM014024
GroupedDBID	--- .4I .DC 29L 2WC 36B 53G 5GY 5VS 77I 77K 7RV 7X7 8FI 8FJ AAFWJ AAKPC AAWTL AAYXX ABDBF ABIVO ABUWG ACGFO ADBBV AEGXH AENEX AFKRA AFPKN AIAGR ALMA_UNASSIGNED_HOLDINGS ALSLI AOIJS BAWUL BCNDV BENPR CCPQU CITATION CNYFK CS3 DIK DU5 DWQXO E3Z EAP EBD EBS EJD ELW EMB EMOBN ESX F5P FRP FYUFA GROUPED_DOAJ GX1 HMCUK HYE IAO ICO IEA IHR INH ISN ITC KQ8 M1O M48 NAPCQ OK1 OVT P2P PGMZT PHGZM PHGZT PIMPY PPXIY PQQKQ PRQQA PUEGO RNS RPM SJN SV3 TR2 UKHRP XSB CGR CUY CVF ECM EIF NPM 7X8 ADRAZ ADTOC C1A O5R O5S UNPAY WOQ
ID	FETCH-LOGICAL-c3901-a8da7b8349bebb89a46c6147e47901ee99b3af29006541e5925bfdef69564043
IEDL.DBID	M48
ISSN	1438-8871 1439-4456
IngestDate	Tue Oct 14 19:09:46 EDT 2025 Sun Oct 26 04:00:30 EDT 2025 Thu Oct 02 05:20:02 EDT 2025 Mon Oct 20 22:48:50 EDT 2025 Mon Oct 20 16:59:16 EDT 2025 Thu Oct 16 15:51:00 EDT 2025 Thu Oct 09 01:30:52 EDT 2025 Wed Oct 01 02:47:54 EDT 2025 Thu Apr 24 22:58:47 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	5
Keywords	deep learning natural language processing large language models retrieval-augmented generation machine learning generative artificial intelligence healthcare
Language	English
License	Rui Yang, Qingcheng Zeng, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Amisha Dave, Tiarnan Keenan, Yuhe Ke, Chuan Hong, Nan Liu, Emily Chew, Dragomir Radev, Zhiyong Lu, Hua Xu, Qingyu Chen, Irene Li. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 03.10.2024. cc-by
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c3901-a8da7b8349bebb89a46c6147e47901ee99b3af29006541e5925bfdef69564043
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 ObjectType-Undefined-3
ORCID	0009-0002-9600-9335 0000-0002-5274-4672 0009-0005-4074-8659 0000-0002-1851-5390 0009-0009-9534-3041 0000-0001-7056-9559 0000-0003-0999-9802 0000-0002-2253-1772 0000-0001-7830-6489 0000-0001-9998-916X 0000-0002-8697-2729 0000-0001-7193-4749 0000-0002-6036-1516 0009-0006-0597-7197 0000-0001-8140-9438 0000-0001-8377-8309 0000-0003-3610-4883 0009-0009-7182-2355 0009-0001-4263-2108
OpenAccessLink	https://doaj.org/article/636cfe56b72c41f0aa9d2e22d25cb671
PMID	39361955
PQID	3112857240
PQPubID	23479
ParticipantIDs	doaj_primary_oai_doaj_org_article_636cfe56b72c41f0aa9d2e22d25cb671 unpaywall_primary_10_2196_60601 proquest_miscellaneous_3112857240 gale_infotracmisc_A811078470 gale_infotracacademiconefile_A811078470 gale_incontextgauss_ISN_A811078470 pubmed_primary_39361955 crossref_primary_10_2196_60601 crossref_citationtrail_10_2196_60601
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2024-Oct-03
PublicationDateYYYYMMDD	2024-10-03
PublicationDate_xml	– month: 10 year: 2024 text: 2024-Oct-03 day: 03
PublicationDecade	2020
PublicationPlace	Canada
PublicationPlace_xml	– name: Canada
PublicationTitle	Journal of medical Internet research
PublicationTitleAlternate	J Med Internet Res
PublicationYear	2024
Publisher	Journal of Medical Internet Research JMIR Publications
Publisher_xml	– name: Journal of Medical Internet Research – name: JMIR Publications
References	ref13 ref57 ref12 ref56 ref15 ref59 ref14 ref58 ref53 ref52 ref11 ref55 ref10 ref54 Yang, R (ref62) 2024; 7 ref17 ref16 ref19 ref51 ref50 ref46 ref45 ref48 ref47 ref42 ref41 ref44 ref43 ref49 ref8 ref7 Eyre, H (ref18) 2022; 2021 ref9 ref4 ref3 ref6 ref5 ref40 ref35 ref34 ref37 ref36 ref31 ref30 ref33 ref32 ref2 ref1 ref39 ref38 ref24 ref23 ref26 ref25 ref20 ref22 ref21 ref28 ref27 ref29 ref60 ref61 41031083 - ArXiv. 2023 Dec 9:arXiv:2311.16588v2.
References_xml	– ident: ref45 doi: 10.18653/v1/p18-4020 – ident: ref32 doi: 10.48550/ARXIV.1912.08777 – ident: ref57 doi: 10.48550/ARXIV.2308.10410 – ident: ref13 doi: 10.1002/hcs2.61 – ident: ref37 doi: 10.18653/v1/2022.bionlp-1.9 – ident: ref23 doi: 10.18653/v1/2023.emnlp-main.385 – ident: ref34 doi: 10.18653/v1/2020.acl-main.703 – ident: ref20 doi: 10.1093/jamia/ocab090 – ident: ref49 doi: 10.5260/chara.21.2.8 – ident: ref6 doi: 10.1109/ichi.2019.8904728 – ident: ref22 doi: 10.1093/nar/gkh061 – ident: ref60 doi: 10.1093/bib/bbad493 – ident: ref58 doi: 10.48550/ARXIV.2402.14293 – ident: ref25 doi: 10.48550/ARXIV.2307.09288 – ident: ref30 doi: 10.2196/48330 – ident: ref35 doi: 10.18653/v1/2022.acl-long.360 – ident: ref41 doi: 10.18653/v1/2021.naacl-main.395 – ident: ref42 doi: 10.18653/v1/2022.emnlp-main.724 – ident: ref59 doi: 10.48550/ARXIV.2401.14589 – ident: ref16 doi: 10.1145/3368555.3384469 – ident: ref27 doi: 10.48550/arXiv.2309.07852 – ident: ref17 doi: 10.18653/v1/w19-5034 – ident: ref26 – ident: ref46 doi: 10.18653/v1/2021.naacl-main.41 – ident: ref31 doi: 10.48550/ARXIV.2304.08763 – ident: ref47 – ident: ref15 doi: 10.48550/ARXIV.2406.12449 – volume: 7 start-page: 209 year: 2024 ident: ref62 publication-title: NPJ Digit Med doi: 10.1038/s41746-024-01212-7 – ident: ref3 doi: 10.1109/JBHI.2017.2767063 – ident: ref40 doi: 10.1038/s41597-020-00667-z – ident: ref54 – ident: ref51 doi: 10.48550/ARXIV.2004.04696 – ident: ref5 doi: 10.1038/s41597-019-0055-0 – ident: ref61 doi: 10.48550/ARXIV.2311.02107 – ident: ref36 doi: 10.48550/ARXIV.2106.03598 – ident: ref50 doi: 10.18653/v1/d19-1053 – ident: ref12 doi: 10.48550/arXiv.2404.18416 – ident: ref14 doi: 10.1101/2024.02.04.24302242 – ident: ref11 doi: 10.48550/arXiv.2305.09617 – ident: ref48 – ident: ref55 doi: 10.1101/2023.04.18.23288752 – ident: ref4 doi: 10.1109/aiccsa.2018.8612827 – ident: ref29 doi: 10.1016/j.jbi.2014.06.009 – ident: ref53 doi: 10.21236/ada006655 – ident: ref44 doi: 10.1007/s11606-021-07164-y – ident: ref2 doi: 10.1016/j.cosrev.2022.100511 – ident: ref39 doi: 10.1038/s41597-019-0322-0 – ident: ref10 doi: 10.1145/3458754 – ident: ref1 doi: 10.5260/chara.21.2.8 – ident: ref28 doi: 10.3233/SHTI190176 – ident: ref19 doi: 10.1186/s12911-021-01459-0 – ident: ref38 doi: 10.18653/v1/n18-2097 – ident: ref9 doi: 10.18653/v1/w19-1909 – ident: ref24 doi: 10.48550/ARXIV.2303.08774 – ident: ref43 doi: 10.48550/ARXIV.2012.02420 – ident: ref52 doi: 10.5260/chara.21.2.8 – ident: ref56 doi: 10.1609/aaai.v38i20.30205 – ident: ref8 doi: 10.1093/bioinformatics/btz682 – volume: 2021 start-page: 438 year: 2022 ident: ref18 publication-title: AMIA Annu Symp Proc – ident: ref7 doi: 10.5260/chara.21.2.8 – ident: ref21 doi: 10.48550/ARXIV.2403.05881 – ident: ref33 doi: 10.48550/ARXIV.2007.14062 – reference: 41031083 - ArXiv. 2023 Dec 9:arXiv:2311.16588v2.
SSID	ssj0020491
Score	2.436251
Snippet	Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address... Background Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To... BackgroundMedical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To...
SourceID	doaj unpaywall proquest gale pubmed crossref
SourceType	Open Website Open Access Repository Aggregation Database Index Database Enrichment Source
StartPage	e60601
SubjectTerms	Algorithms Analysis Automation Computational linguistics Humans Language processing Mechanization Medical care Medical literature Medical research Medicine, Experimental Natural language interfaces Natural Language Processing Python (Programming language) Quality management Rankings Software Technology application
SummonAdditionalLinks	– databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3NbtQwEB6hHgoSQlD-Am0xqIJT1E0cOzG3BbUqiK6QWKTeLP-lQqySqrsrtDcegifkSZhJvNEuIHHhGk8Se2bsmbFnPgMcKRO4CtykXHEMUErvUiOMSxWGAs4rKTJH9c7nE3n2uXh_IS42rvqinLAeHrhn3LHk0tVBSFvmrsjqkTHK5yHPfS6clV31eD6q1DqYiqEW-r3ZLtymRGdUsWNJqCNblqcD6P9zGd6wQzeXzZVZfTOz2YbBOb0Ld6KnyMZ9D-_BjdDswUGsM2AvWSwkIsayOEP3YPc8npXfh-vxHN_7-f3HmH1cEUIAm5gOZIN9iHuULFYJoPVi07adff2yYPhNFg9v2BQXbtbjUtNfXrONDCNmGs9OBqhwRvmIqwcwPT2Zvj1L4w0LqaO9jtRU3pS24oWywdpKmUI6tNdlKEpsDkEpy02dq64ENQtC5cLWPtQSoyqC5XkIO03bhMfArOO2rEVlVOYLboRVtc1UqL1DSh5kAkdr5msX0cfpEoyZxiiEZKQ7GSVwOJBd9XAbvxO8IckNjYSO3T1AndFRZ_S_dCaBFyR3TfgXDSXYXJrlfK7ffZrocUUBMZrsUQKvIlHdYk-difUKOF6CzNqi3N-ixAnqtpqfr9VLUxNltTWhXc41R2e3EiU6VQk86vVuGBjOHYxthUjg2aCIf2fJk__BkqdwK0d3rUtT5Puws7hehgN0txb2sJtZvwCAJChb priority: 102 providerName: Directory of Open Access Journals – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1Lb9QwEB6VrVQqIR7lFegWgyp6Smni2Im5BdSqILqqxFYqp8h2HFR1lVTNRmg58SP4hfwSxok32hQQtygeO37MZGbsmc8Au0IaKgyVPhUUHZQ4175kUvsCXQGdC84CbfOdTyb8-Cz6eM7O12C8zIVZOb9HWeJvuMULuQXrnKGpPYL1s8lp-qXNGEJBRQEJumfhR2gJbMCdQb2Brmkh-f_88a5onttNeSUX3-RstqJiju51u4F1i0xoI0su95u52tffb-A2_rP39-GuMy5J2nHDA1gz5RaMXWoCeU1c7pFdC-KEegs2Ttzx-kO4Tmus9-vHz5ScLiyoAJnIFpeDfHLbmsQlFqDCI9Oqml1ezAm2Sdx5D5niv550UNb2K2_JSlASkWVODnt0cWJDGBePYHp0OH1_7LtLGXxtt0d8meQyVgmNhDJKJUJGXKOKj00UY7ExQigqi1C0WauBYSJkqshNwdERs0g-j2FUVqV5CkRpquKCJVIEeUQlU6JQgTBFrpGSGu7B7nL1Mu0Ay-29GbMMHRc7vVk7vR7s9GRXHULHTYJ3dun7Qguo3b7AlcqcfGaccl0YxlUc6igoDqQUeWjCMA-ZVjzGRl5ZxsksZEZpY3K-yqausw-fJ1maWB8atfyBB3uOqKiwp1q6FAccr0XZGlBuDyhRpvWg-OWSPzNbZAPhSlM1dUbRPk5YjHaYB086xu0HhuKG7jBjHrzoOfnvU_LsvxTPYTNE860NW6TbMJpfN2aM5tdc7Tgh_A08SSmL priority: 102 providerName: Unpaywall
Title	Ascle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study
URI	https://www.ncbi.nlm.nih.gov/pubmed/39361955 https://www.proquest.com/docview/3112857240 https://doi.org/10.2196/60601 https://doaj.org/article/636cfe56b72c41f0aa9d2e22d25cb671
UnpaywallVersion	publishedVersion
Volume	26
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1438-8871 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: KQ8 dateStart: 19990101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAON databaseName: Openly Available Collection - DOAJ customDbUrl: eissn: 1438-8871 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: DOA dateStart: 19990101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVEBS databaseName: EBSCOhost Academic Search Ultimate customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn eissn: 1438-8871 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: ABDBF dateStart: 20050101 isFulltext: true titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn providerName: EBSCOhost – providerCode: PRVBFR databaseName: Free Medical Journals customDbUrl: eissn: 1438-8871 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: DIK dateStart: 19990101 isFulltext: true titleUrlDefault: http://www.freemedicaljournals.com providerName: Flying Publisher – providerCode: PRVFQY databaseName: GFMER Free Medical Journals customDbUrl: eissn: 1438-8871 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: GX1 dateStart: 19990101 isFulltext: true titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php providerName: Geneva Foundation for Medical Education and Research – providerCode: PRVAQN databaseName: PubMed Central customDbUrl: eissn: 1438-8871 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: RPM dateStart: 19990101 isFulltext: true titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/ providerName: National Library of Medicine – providerCode: PRVPQU databaseName: Health & Medical Collection customDbUrl: eissn: 1438-8871 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: 7X7 dateStart: 20010101 isFulltext: true titleUrlDefault: https://search.proquest.com/healthcomplete providerName: ProQuest – providerCode: PRVPQU databaseName: Library Science Database customDbUrl: eissn: 1438-8871 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: M1O dateStart: 20010101 isFulltext: true titleUrlDefault: https://search.proquest.com/libraryscience providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 eissn: 1438-8871 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: BENPR dateStart: 20010101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVFZP databaseName: Scholars Portal Journals: Open Access customDbUrl: eissn: 1438-8871 dateEnd: 20250131 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: M48 dateStart: 20100201 isFulltext: true titleUrlDefault: http://journals.scholarsportal.info providerName: Scholars Portal
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwED_tQxpICMH4KmzFoGk8BZY4TmIkhDrUaSBaKmil8hTZjjMhoqTrh6D_PXepG7WwB176EF_cxHeXu7PvfgdwIpXl0nLlcckxQIkz4ymhjCcxFDCZjIRvqN65148uR-GnsRjvwLpc0S3g7MbQjvpJjabF69_Xy_eo8O8ojRkF6E1EmCKnk2uPeknRmatrrLEL-2ivJDV06IXN2UKAPnEdhoWo6qhi_gHc2Zppy0LVQP7_fq437NWtRTlRy1-qKDYM08U9uOs8StZZicB92LHlIRy7egR2ylzBETGAOU0-hIOeO1N_AJPODO_zOmywJBwB1lc1FAf77HYymaslQBvHhlVV_PwxZzgjc0c8bIifd7ZCr6b_eMs28pCYKjPWbQDFGWUtLh_C8KI7_HDpuT4MnqEdEU8lmYp1wkOprdaJVGFk0KrHNoxx2FopNVd5IOtCVd8KGQidZzaPMPYi8J5HsFdWpX0CTBuu41wkSvpZyJXQMte-tHlmkJLbqAUn66VPjcMop1YZRYqxCnEorTnUgnZDNlmBcvxNcE58awYJQ7u-UE2vUqeSacQjk1sR6TgwoZ-fKSWzwAZBFgijoxgneUlcTwklo6Q0nCu1mM3Sj9_6aSehsBkN-1kLXjmivMInNcpVNeD7ErDWFuXRFiWqsdkafrEWrpSGKPettNVilnJ0iRMRoyi34PFK6poXQw3DCFiIFjxvxPDmJXn6H_M_g9sB-mx1riI_gr35dGGP0eea6zbsxuO4Dfvn3f7ga7veucDfnv-lXSsXjoz6g873P8CFL3s
linkProvider	Scholars Portal
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1Lb9QwEB6VrVQqIR7lFegWgyp6Smni2Im5BdSqILqqxFYqp8h2HFR1lVTNRmg58SP4hfwSxok32hQQtygeO37MZGbsmc8Au0IaKgyVPhUUHZQ4175kUvsCXQGdC84CbfOdTyb8-Cz6eM7O12C8zIVZOb9HWeJvuMULuQXrnKGpPYL1s8lp-qXNGEJBRQEJumfhR2gJbMCdQb2Brmkh-f_88a5onttNeSUX3-RstqJiju51u4F1i0xoI0su95u52tffb-A2_rP39-GuMy5J2nHDA1gz5RaMXWoCeU1c7pFdC-KEegs2Ttzx-kO4Tmus9-vHz5ScLiyoAJnIFpeDfHLbmsQlFqDCI9Oqml1ezAm2Sdx5D5niv550UNb2K2_JSlASkWVODnt0cWJDGBePYHp0OH1_7LtLGXxtt0d8meQyVgmNhDJKJUJGXKOKj00UY7ExQigqi1C0WauBYSJkqshNwdERs0g-j2FUVqV5CkRpquKCJVIEeUQlU6JQgTBFrpGSGu7B7nL1Mu0Ay-29GbMMHRc7vVk7vR7s9GRXHULHTYJ3dun7Qguo3b7AlcqcfGaccl0YxlUc6igoDqQUeWjCMA-ZVjzGRl5ZxsksZEZpY3K-yqausw-fJ1maWB8atfyBB3uOqKiwp1q6FAccr0XZGlBuDyhRpvWg-OWSPzNbZAPhSlM1dUbRPk5YjHaYB086xu0HhuKG7jBjHrzoOfnvU_LsvxTPYTNE860NW6TbMJpfN2aM5tdc7Tgh_A08SSmL
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ascle-A+Python+Natural+Language+Processing+Toolkit+for+Medical+Text+Generation%3A+Development+and+Evaluation+Study&rft.jtitle=Journal+of+medical+Internet+research&rft.au=Yang%2C+Rui&rft.au=Zeng%2C+Qingcheng&rft.au=You%2C+Keen&rft.au=Qiao%2C+Yujie&rft.date=2024-10-03&rft.issn=1438-8871&rft.eissn=1438-8871&rft.volume=26&rft.spage=e60601&rft_id=info:doi/10.2196%2F60601&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1438-8871&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1438-8871&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1438-8871&client=summon