ParaMed: a parallel corpus for English–Chinese translation in the biomedical domain

Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations. Machine translation represents an effective alternative, but accurate machine tr...

Full description

Saved in:

Bibliographic Details
Published in	BMC medical informatics and decision making Vol. 21; no. 1; pp. 258 - 11
Main Authors	Liu, Boxiang, Huang, Liang
Format	Journal Article
Language	English
Published	England BioMed Central 06.09.2021 BMC
Subjects	Algorithms Bilingualism China Clinical trials Datasets Domains Editorials English language Health informatics Humans Interpreters Language Language translation Machine translation Natural Language Processing Text mining Translating Translation Translations Translators Websites China Text mining Machine translation Natural language processing
Online Access	Get full text
ISSN	1472-6947 1472-6947
DOI	10.1186/s12911-021-01621-8

Cover

Abstract	Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations. Machine translation represents an effective alternative, but accurate machine translation requires large amounts of in-domain data. While such datasets are abundant in general domains, they are less accessible in the biomedical domain. Chinese and English are two of the most widely spoken languages, yet to our knowledge, a parallel corpus does not exist for this language pair in the biomedical domain. We developed an effective pipeline to acquire and process an English-Chinese parallel corpus from the New England Journal of Medicine (NEJM). This corpus consists of about 100,000 sentence pairs and 3,000,000 tokens on each side. We showed that training on out-of-domain data and fine-tuning with as few as 4000 NEJM sentence pairs improve translation quality by 25.3 (13.4) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions. Translation quality continues to improve at a slower pace on larger in-domain data subsets, with a total increase of 33.0 (24.3) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions on the full dataset. The code and data are available at https://github.com/boxiangliu/ParaMed .
AbstractList	Abstract Background Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations. Machine translation represents an effective alternative, but accurate machine translation requires large amounts of in-domain data. While such datasets are abundant in general domains, they are less accessible in the biomedical domain. Chinese and English are two of the most widely spoken languages, yet to our knowledge, a parallel corpus does not exist for this language pair in the biomedical domain. Description We developed an effective pipeline to acquire and process an English-Chinese parallel corpus from the New England Journal of Medicine (NEJM). This corpus consists of about 100,000 sentence pairs and 3,000,000 tokens on each side. We showed that training on out-of-domain data and fine-tuning with as few as 4000 NEJM sentence pairs improve translation quality by 25.3 (13.4) BLEU for en $$\rightarrow$$ → zh (zh $$\rightarrow$$ → en) directions. Translation quality continues to improve at a slower pace on larger in-domain data subsets, with a total increase of 33.0 (24.3) BLEU for en $$\rightarrow$$ → zh (zh $$\rightarrow$$ → en) directions on the full dataset. Conclusions The code and data are available at https://github.com/boxiangliu/ParaMed . Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations. Machine translation represents an effective alternative, but accurate machine translation requires large amounts of in-domain data. While such datasets are abundant in general domains, they are less accessible in the biomedical domain. Chinese and English are two of the most widely spoken languages, yet to our knowledge, a parallel corpus does not exist for this language pair in the biomedical domain. We developed an effective pipeline to acquire and process an English-Chinese parallel corpus from the New England Journal of Medicine (NEJM). This corpus consists of about 100,000 sentence pairs and 3,000,000 tokens on each side. We showed that training on out-of-domain data and fine-tuning with as few as 4000 NEJM sentence pairs improve translation quality by 25.3 (13.4) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions. Translation quality continues to improve at a slower pace on larger in-domain data subsets, with a total increase of 33.0 (24.3) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions on the full dataset. The code and data are available at https://github.com/boxiangliu/ParaMed . Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations. Machine translation represents an effective alternative, but accurate machine translation requires large amounts of in-domain data. While such datasets are abundant in general domains, they are less accessible in the biomedical domain. Chinese and English are two of the most widely spoken languages, yet to our knowledge, a parallel corpus does not exist for this language pair in the biomedical domain.BACKGROUNDBiomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations. Machine translation represents an effective alternative, but accurate machine translation requires large amounts of in-domain data. While such datasets are abundant in general domains, they are less accessible in the biomedical domain. Chinese and English are two of the most widely spoken languages, yet to our knowledge, a parallel corpus does not exist for this language pair in the biomedical domain.We developed an effective pipeline to acquire and process an English-Chinese parallel corpus from the New England Journal of Medicine (NEJM). This corpus consists of about 100,000 sentence pairs and 3,000,000 tokens on each side. We showed that training on out-of-domain data and fine-tuning with as few as 4000 NEJM sentence pairs improve translation quality by 25.3 (13.4) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions. Translation quality continues to improve at a slower pace on larger in-domain data subsets, with a total increase of 33.0 (24.3) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions on the full dataset.DESCRIPTIONWe developed an effective pipeline to acquire and process an English-Chinese parallel corpus from the New England Journal of Medicine (NEJM). This corpus consists of about 100,000 sentence pairs and 3,000,000 tokens on each side. We showed that training on out-of-domain data and fine-tuning with as few as 4000 NEJM sentence pairs improve translation quality by 25.3 (13.4) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions. Translation quality continues to improve at a slower pace on larger in-domain data subsets, with a total increase of 33.0 (24.3) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions on the full dataset.The code and data are available at https://github.com/boxiangliu/ParaMed .CONCLUSIONSThe code and data are available at https://github.com/boxiangliu/ParaMed . Background Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations. Machine translation represents an effective alternative, but accurate machine translation requires large amounts of in-domain data. While such datasets are abundant in general domains, they are less accessible in the biomedical domain. Chinese and English are two of the most widely spoken languages, yet to our knowledge, a parallel corpus does not exist for this language pair in the biomedical domain. Description We developed an effective pipeline to acquire and process an English-Chinese parallel corpus from the New England Journal of Medicine (NEJM). This corpus consists of about 100,000 sentence pairs and 3,000,000 tokens on each side. We showed that training on out-of-domain data and fine-tuning with as few as 4000 NEJM sentence pairs improve translation quality by 25.3 (13.4) BLEU for en$\rightarrow$zh (zh$\rightarrow$en) directions. Translation quality continues to improve at a slower pace on larger in-domain data subsets, with a total increase of 33.0 (24.3) BLEU for en$\rightarrow$zh (zh$\rightarrow$en) directions on the full dataset. Conclusions The code and data are available at https://github.com/boxiangliu/ParaMed.
ArticleNumber	258
Author	Liu, Boxiang Huang, Liang
Author_xml	– sequence: 1 givenname: Boxiang orcidid: 0000-0002-2595-4463 surname: Liu fullname: Liu, Boxiang – sequence: 2 givenname: Liang surname: Huang fullname: Huang, Liang
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/34488734$$D View this record in MEDLINE/PubMed
BookMark	eNp9Uk1vEzEQtVARbQN_gANaiQuXwI7X8QcHJBQVqFQEB3q2vPZs4sixg72LxI3_wD_sL6mblKrtgYPtkefN07yZd0qOYopIyEto3wJI_q4AVQDzltYDvN7yCTkBJuicKyaO7sXH5LSUTduCkN3iGTnuGJNSdOyEXH432XxF974xza6GIWBobMq7qTRDys1ZXAVf1ld__i7XPmLBZswmlmBGn2LjYzOusel92qLz1oTGpa3x8Tl5OphQ8MXtOyOXn85-LL_ML759Pl9-vJhbpvg4l6rHtqescxTUIFFZND3SHnCwznAxIFOuaugN7xkCt2qhasa5dqHAcdvNyPmB1yWz0bvstyb_1sl4vf9IeaVNHr0NqNEIOgjDXCcFg84pKZ2yXd8jGAVdV7k-HLh2U1_VWIxVaXhA-jAT_Vqv0i8tGaWc80rw5pYgp58TllFvfbEYgomYpqLpQrQAIAAq9PUj6CZNOdZR3aAYE0LWqczIq_sd3bXyb3sVQA8Am1MpGYc7CLT6xiL6YBFdLaL3FtGyFslHRdaP-3VWVT78r_QabHfCLQ
CitedBy_id	crossref_primary_10_3390_electronics13071381 crossref_primary_10_1093_llc_fqac089 crossref_primary_10_1145_3626095 crossref_primary_10_3390_jpm14090923 crossref_primary_10_3390_app12126002 crossref_primary_10_3390_app14167088 crossref_primary_10_1016_j_csl_2023_101582 crossref_primary_10_2478_amns_2025_0565
Cites_doi	10.18653/v1/W17-2507 10.18653/v1/P16-1162 10.18653/v1/W18-6478 10.1136/bmj.316.7124.2a 10.18653/v1/P16-1009 10.1075/cilt.292.32var 10.18653/v1/W16-2301 10.18653/v1/2020.emnlp-main.6 10.1007/3-540-45820-4_14 10.18653/v1/W18-6453 10.18653/v1/W16-2369 10.18653/v1/W16-2347 10.18653/v1/P17-4012 10.18653/v1/W18-6401 10.18653/v1/W17-4717 10.3115/1557769.1557821 10.1162/neco.1997.9.8.1735 10.1093/nar/gkh061 10.1136/bmj.b2354 10.18653/v1/W19-5301 10.18653/v1/W18-6488
ContentType	Journal Article
Copyright	2021. The Author(s). 2021. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. The Author(s) 2021
Copyright_xml	– notice: 2021. The Author(s). – notice: 2021. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: The Author(s) 2021
DBID	AAYXX CITATION CGR CUY CVF ECM EIF NPM 3V. 7QO 7SC 7X7 7XB 88C 88E 8AL 8FD 8FE 8FG 8FH 8FI 8FJ 8FK ABUWG AFKRA ARAPS AZQEC BBNVY BENPR BGLVJ BHPHI CCPQU DWQXO FR3 FYUFA GHDGH GNUQQ HCIFZ JQ2 K7- K9. L7M LK8 L~C L~D M0N M0S M0T M1P M7P P5Z P62 P64 PHGZM PHGZT PIMPY PJZUB PKEHL PPXIY PQEST PQGLB PQQKQ PQUKI PRINS Q9U 7X8 5PM DOA
DOI	10.1186/s12911-021-01621-8
DatabaseName	CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed ProQuest Central (Corporate) Biotechnology Research Abstracts Computer and Information Systems Abstracts ProQuest Health & Medical Collection ProQuest Central (purchase pre-March 2016) Healthcare Administration Database (Alumni) Medical Database (Alumni Edition) Computing Database (Alumni Edition) Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Natural Science Collection Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials - QC Biological Science Collection ProQuest Central Technology Collection (via ProQuest SciTech Premium Collection) Natural Science Collection ProQuest One Community College ProQuest Central Engineering Research Database Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database ProQuest Health & Medical Complete (Alumni) Advanced Technologies Database with Aerospace Biological Sciences Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Computing Database ProQuest Health & Medical Collection Healthcare Administration Database PML(ProQuest Medical Library) ProQuest Biological Science Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection Biotechnology and BioEngineering Abstracts ProQuest Central Premium ProQuest One Academic (New) ProQuest Publicly Available Content Database ProQuest Health & Medical Research Collection ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest Central Basic MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Publicly Available Content Database Computer Science Database ProQuest Central Student ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts SciTech Premium Collection ProQuest Central China ProQuest One Applied & Life Sciences Health Research Premium Collection Natural Science Collection Health & Medical Research Collection Biological Science Collection ProQuest Central (New) ProQuest Medical Library (Alumni) Advanced Technologies & Aerospace Collection ProQuest Biological Science Collection ProQuest One Academic Eastern Edition ProQuest Hospital Collection ProQuest Technology Collection Health Research Premium Collection (Alumni) Biological Science Database ProQuest Hospital Collection (Alumni) Biotechnology and BioEngineering Abstracts ProQuest Health & Medical Complete ProQuest One Academic UKI Edition ProQuest Health Management (Alumni Edition) Engineering Research Database ProQuest One Academic ProQuest One Academic (New) Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest One Health & Nursing ProQuest Natural Science Collection ProQuest Central ProQuest Health & Medical Research Collection Biotechnology Research Abstracts Health and Medicine Complete (Alumni Edition) ProQuest Central Korea Advanced Technologies Database with Aerospace ProQuest Computing ProQuest Central Basic ProQuest Computing (Alumni Edition) ProQuest Health Management ProQuest SciTech Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest Medical Library ProQuest Central (Alumni) MEDLINE - Academic
DatabaseTitleList	MEDLINE MEDLINE - Academic Publicly Available Content Database
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 4 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Medicine
EISSN	1472-6947
EndPage	11
ExternalDocumentID	oai_doaj_org_article_ea72f7a4d387413d988d9c3bbe1a9133 PMC8422666 34488734 10_1186_s12911_021_01621_8
Genre	Journal Article
GeographicLocations	China
GeographicLocations_xml	– name: China
GroupedDBID	--- 0R~ 23N 2WC 53G 5VS 6J9 6PF 7X7 88E 8FE 8FG 8FH 8FI 8FJ AAFWJ AAJSJ AAKPC AASML AAWTL AAYXX ABDBF ABUWG ACGFO ACGFS ACIWK ACPRK ACUHS ADBBV ADUKV AENEX AFKRA AFPKN AFRAH AHBYD AHMBA AHYZX ALIPV ALMA_UNASSIGNED_HOLDINGS AMKLP AMTXH AOIJS AQUVI ARAPS AZQEC BAPOH BAWUL BBNVY BCNDV BENPR BFQNJ BGLVJ BHPHI BMC BPHCQ BVXVI C6C CCPQU CITATION CS3 DIK DU5 DWQXO E3Z EAD EAP EAS EBD EBLON EBS EMB EMK EMOBN ESX F5P FYUFA GNUQQ GROUPED_DOAJ GX1 HCIFZ HMCUK HYE IAO IHR INH INR ITC K6V K7- KQ8 LK8 M0T M1P M48 M7P M~E O5R O5S OK1 OVT P2P P62 PGMZT PHGZM PHGZT PIMPY PQQKQ PROAC PSQYO RBZ RNS ROL RPM RSV SMD SOJ SV3 TR2 TUS UKHRP W2D WOQ WOW XSB CGR CUY CVF ECM EIF NPM PJZUB PPXIY PQGLB 3V. 7QO 7SC 7XB 8AL 8FD 8FK FR3 JQ2 K9. L7M L~C L~D M0N P64 PKEHL PQEST PQUKI PRINS Q9U 7X8 PUEGO 5PM
ID	FETCH-LOGICAL-c496t-89be0b243d219f8e9ceabe2b1efcda67fe49d472ba6b4e16c959fcddd0591d6c3
IEDL.DBID	BENPR
ISSN	1472-6947
IngestDate	Wed Aug 27 01:17:27 EDT 2025 Thu Aug 21 14:28:57 EDT 2025 Thu Sep 04 18:55:43 EDT 2025 Fri Jul 25 19:04:00 EDT 2025 Mon Jul 21 06:00:43 EDT 2025 Tue Jul 01 04:05:53 EDT 2025 Thu Apr 24 23:07:03 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	1
Keywords	Text mining Machine translation Natural language processing
Language	English
License	2021. The Author(s). Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c496t-89be0b243d219f8e9ceabe2b1efcda67fe49d472ba6b4e16c959fcddd0591d6c3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ORCID	0000-0002-2595-4463
OpenAccessLink	https://www.proquest.com/docview/2574477824?pq-origsite=%requestingapplication%&accountid=15518
PMID	34488734
PQID	2574477824
PQPubID	42572
PageCount	11
ParticipantIDs	doaj_primary_oai_doaj_org_article_ea72f7a4d387413d988d9c3bbe1a9133 pubmedcentral_primary_oai_pubmedcentral_nih_gov_8422666 proquest_miscellaneous_2570111711 proquest_journals_2574477824 pubmed_primary_34488734 crossref_primary_10_1186_s12911_021_01621_8 crossref_citationtrail_10_1186_s12911_021_01621_8
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2021-09-06
PublicationDateYYYYMMDD	2021-09-06
PublicationDate_xml	– month: 09 year: 2021 text: 2021-09-06 day: 06
PublicationDecade	2020
PublicationPlace	England
PublicationPlace_xml	– name: England – name: London
PublicationTitle	BMC medical informatics and decision making
PublicationTitleAlternate	BMC Med Inform Decis Mak
PublicationYear	2021
Publisher	BioMed Central BMC
Publisher_xml	– name: BioMed Central – name: BMC
References	1621_CR31 1621_CR10 1621_CR32 1621_CR11 1621_CR33 1621_CR34 1621_CR13 1621_CR14 WA Gale (1621_CR20) 1993; 19 S Hochreiter (1621_CR35) 1997; 9 1621_CR36 1621_CR15 1621_CR37 1621_CR16 1621_CR38 1621_CR30 1621_CR5 1621_CR3 1621_CR9 J Tiedemann (1621_CR12) 2012; 2012 1621_CR8 1621_CR7 1621_CR6 1621_CR17 1621_CR39 1621_CR18 1621_CR42 1621_CR21 O Bodenreider (1621_CR4) 2004; 32 1621_CR22 1621_CR23 1621_CR24 1621_CR25 A Das (1621_CR2) 2009; 338 1621_CR26 1621_CR27 S Bird (1621_CR19) 2009 1621_CR40 I Bamforth (1621_CR1) 1998; 316 1621_CR41 1621_CR28 1621_CR29
References_xml	– ident: 1621_CR8 doi: 10.18653/v1/W17-2507 – ident: 1621_CR37 doi: 10.18653/v1/P16-1162 – ident: 1621_CR27 doi: 10.18653/v1/W18-6478 – ident: 1621_CR24 – ident: 1621_CR6 – volume: 2012 start-page: 2214 year: 2012 ident: 1621_CR12 publication-title: LREC – volume: 316 start-page: 2 issue: 7124 year: 1998 ident: 1621_CR1 publication-title: BMJ doi: 10.1136/bmj.316.7124.2a – ident: 1621_CR28 – ident: 1621_CR5 doi: 10.18653/v1/P16-1009 – ident: 1621_CR22 doi: 10.1075/cilt.292.32var – ident: 1621_CR41 doi: 10.18653/v1/W16-2301 – ident: 1621_CR40 doi: 10.18653/v1/2020.emnlp-main.6 – volume-title: Natural language processing with Python year: 2009 ident: 1621_CR19 – ident: 1621_CR21 doi: 10.1007/3-540-45820-4_14 – ident: 1621_CR13 – ident: 1621_CR26 doi: 10.18653/v1/W18-6453 – ident: 1621_CR32 – ident: 1621_CR9 – ident: 1621_CR30 – ident: 1621_CR11 – ident: 1621_CR16 doi: 10.18653/v1/W16-2369 – ident: 1621_CR17 – ident: 1621_CR38 – ident: 1621_CR15 doi: 10.18653/v1/W16-2347 – ident: 1621_CR3 – ident: 1621_CR23 – ident: 1621_CR34 doi: 10.18653/v1/P17-4012 – ident: 1621_CR36 doi: 10.18653/v1/W18-6401 – ident: 1621_CR25 – ident: 1621_CR42 doi: 10.18653/v1/W17-4717 – ident: 1621_CR7 – ident: 1621_CR29 doi: 10.3115/1557769.1557821 – volume: 19 start-page: 75 issue: 1 year: 1993 ident: 1621_CR20 publication-title: Comput Linguist – volume: 9 start-page: 1735 issue: 8 year: 1997 ident: 1621_CR35 publication-title: Neural Comput doi: 10.1162/neco.1997.9.8.1735 – ident: 1621_CR10 – ident: 1621_CR33 – volume: 32 start-page: 267 issue: suppl–1 year: 2004 ident: 1621_CR4 publication-title: Nucleic Acids Res doi: 10.1093/nar/gkh061 – ident: 1621_CR31 – volume: 338 start-page: 2354 year: 2009 ident: 1621_CR2 publication-title: BMJ doi: 10.1136/bmj.b2354 – ident: 1621_CR14 doi: 10.18653/v1/W19-5301 – ident: 1621_CR18 – ident: 1621_CR39 doi: 10.18653/v1/W18-6488
SSID	ssj0017835
Score	2.3449736
Snippet	Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified... Background Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train... Abstract Background Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging...
SourceID	doaj pubmedcentral proquest pubmed crossref
SourceType	Open Website Open Access Repository Aggregation Database Index Database Enrichment Source
StartPage	258
SubjectTerms	Algorithms Bilingualism China Clinical trials Datasets Domains Editorials English language Health informatics Humans Interpreters Language Language translation Machine translation Natural Language Processing Text mining Translating Translation Translations Translators Websites
SummonAdditionalLinks	– databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT-MwELYQhxUXtA92CS95JW4ooo5dx-YGCFStVMSBStwsPyaiUjdFtL3zH_iH_JKdSdKKrlZw2UukxI7ijMcz39jzYOwYdaLSfXKrUD2ZK5V8bsCTs782oUpBpEgb-sMbPRipX_f9-zelvsgnrE0P3BLuFHxZVKVXSRpUfjJZY5KNMgQQ3qKBRdK3Z3tLY6o7P6D9jGWIjNGnM9RqtBVYkOms8WrW1FCTrf9fEPNvT8k3quf6M9vuMCM_b8f6hW1A_ZV9Gnan4t_Y6NY_ebw9455TKu_JBCYcrcrHxYwjJuVdqO7r8wtVy4YZ8DlpqNYLjo9rjiCQt3H4NGU8TX_7cb3DRtdXd5eDvCuXkEdl9Tw3NkAvFEomlEKVARvBByiCgComr8sKlE2qLILXQYHQ0fYttqSECEskHeV3tllPa9hlnCA30dhTOZjCSyoVK4OPSaYKv2AzJpbUc7HLJU4lLSausSmMdi3FHVLcNRR3JmMnq3ce20wa7_a-oElZ9aQs2M0D5A3X8Yb7iDcydrCcUtctzZlDGaVUicBIZeznqhkXFZ2U-Bqmi6YPyj1RCpGxHy0HrEYi0aA1pcS3yzXeWBvqeks9fmgSdxsKW9Z673_82z7bKhp-piQSB2xz_rSAQ8RH83DULIU_pSoOLw priority: 102 providerName: Directory of Open Access Journals – databaseName: Scholars Portal Journals: Open Access dbid: M48 link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3di9QwEB_OE8QX8dvqKRF8k-qlyaaJIKLicQgnPrhwbyFf1YXaPfcD7t78H_wP_UucSdvVleVeCm0SOkwmmd9kJjMAz1AnSjWhsAp5KEopoyt1chTsr7Rvoucx0IH-ySd1PJUfTyenezCWOxoYuNxp2lE9qemifXH-4-INLvjXecFr9XKJOosO-ioyjBU-9RW4mv1FFMon_3oV6JRjvDizc9yWcso5_HcBz__jJ_9RSEc34caAJNnbfupvwV7qbsO1k8FXfgemn93C4esr5hgl-G7b1DK0Nc_WS4ZIlQ0XeH___EU1tNMysRXprT42js06htCQ9bfzaSJZnH93s-4uTI8-fHl_XA5FFMogjVqV2vh06CspIu5NjU4mJOdT5XlqQnSqbpI0UdaVd8rLxFUwE4MtMSLu4lEFcQ_2u3mXHgAjIB6Ex55oFlZOUAFZ4V2IIjb4B1MAH7lnw5BhnApdtDZbGlrZnuMWOW4zx60u4PlmzFmfX-PS3u9oUjY9KTd2_jBffLXDUrPJ1VVTOxmFRrgkotE6ZroTdwZN8gIOxim1o7xZ3LmkrBEuyQKebppxqZH_xHVpvs59cDfkNecF3O8lYEOJQDNX1wJH11uysUXqdks3-5bTeWu6zKzUw8vJegTXqyyplDTiAPZXi3V6jHho5Z9kIf8Df5oIqw priority: 102 providerName: Scholars Portal
Title	ParaMed: a parallel corpus for English–Chinese translation in the biomedical domain
URI	https://www.ncbi.nlm.nih.gov/pubmed/34488734 https://www.proquest.com/docview/2574477824 https://www.proquest.com/docview/2570111711 https://pubmed.ncbi.nlm.nih.gov/PMC8422666 https://doaj.org/article/ea72f7a4d387413d988d9c3bbe1a9133
Volume	21
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfR3bihMxNOx2QXwR73ZdSwTfZNjOJE0ygoiVrYvQsiwWii8ht9FCnam9vO8_-Id-iedkLlqRfTkwkzNMSE7OLedCyCuQiVyMMKyCD1nCuTeJCgaD_YWyhbepd-jQn87E5Zx_WowWR2TW5sJgWGXLEyOj9pVDH_k5kBbnEuQZf7f-kWDXKLxdbVtomKa1gn8bS4wdkxNgyaNhj5yML2ZX1929Avo52tQZJc63IO3QRZihSS0AqgPxFKv4_0_1_DeC8i-RNLlP7jW6JH1fb_4DchTKh-TOtLktf0TmV2Zj4PENNRRLfK9WYUXB2lzvtxR0Vdqk8P66-YldtMM20B1Krjo6ji5LCsohrfPzcSupr76bZfmYzCcXnz9cJk0bhcTxXOwSldswtBlnHrhToULugrEhs2konDdCFoHnnsvMGmF5SIXLRzmMeA-aV-qFY09Ir6zK8IxQVMUds4AJhmFmGLaQZdY4z3wBf8j7JG1XT7umxji2uljpaGsooesV17DiOq64Vn3yuvtmXVfYuBV7jJvSYWJ17Pii2nzVzWHTwciskIZ7pkBhYj5Xysd5h9TkYJT3yVm7pbo5slv9h8D65GU3DIcNb1BMGap9xAF-mMo07ZOnNQV0M2Fg6CrJ4Gt5QBsHUz0cKZffYkFvhenMQpzePq3n5G4WKRXLRpyR3m6zDy9AI9rZATmWCwlQTT4OGpIfRO8CwClXAK_HX34DficQzw
linkProvider	ProQuest
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtR3LbtQwcFS2EnBBvAkUMBKcUNQm9joOUoUotNrS7qpCXak3144dWGlJln0IceMf-B8-hi9hJi9YhHrrJVJiJxmNx573DMBz5IlC9imsQuzwUAhnQuUNBftLZXNnI5eRQX84koOxeH_WP9uAn20uDIVVtmdidVC7MiMb-TaSlhAJ8jPxevYlpK5R5F1tW2iYprWC261KjDWJHUf-21dU4Ra7h-9wvV_E8cH-6dtB2HQZCDORymWoUut3bCy4w82bK59m3lgf28jnmTMyyb1InUhia6QVPpJZ2k9xxDkUTCInM47fvQKbggwoPdjc2x-dfOj8GGRXaVN1lNxeIHclk2RMKrzEq1pjh1XXgP-Juv9GbP7FAg9uwo1GdmVvamK7BRu-uA1Xh413_g6MT8zc4O0rZhiVFJ9O_ZShdjtbLRjKxqxJGf71_Qd17fYLz5bEKetoPDYpGAqjrK4HQKTDXPnZTIq7ML4UhN6DXlEW_gEwEv0zbnEmKqKx4dSylluTOe5y_EMaQNRiT2dNTXNqrTHVlW6jpK4xrhHjusK4VgG87N6Z1RU9Lpy9R4vSzaRq3NWDcv5RN5tbe5PEeWKE4woFNO5SpVwFt49MGnEewFa7pLo5Ihb6D0EH8Kwbxs1NHhtT-HJVzcHzN0qiKID7NQV0kHBUrFXC8e1kjTbWQF0fKSafqgLiitKnpXx4MVhP4drgdHisjw9HR4_gelxRLZWs2ILecr7yj1EaW9onDckzOL_sXfYby4lL_Q
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ParaMed%3A+a+parallel+corpus+for+English%E2%80%93Chinese+translation+in+the+biomedical+domain&rft.jtitle=BMC+medical+informatics+and+decision+making&rft.au=Liu%2C+Boxiang&rft.au=Huang%2C+Liang&rft.date=2021-09-06&rft.pub=BioMed+Central&rft.eissn=1472-6947&rft.volume=21&rft.spage=1&rft_id=info:doi/10.1186%2Fs12911-021-01621-8
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1472-6947&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1472-6947&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1472-6947&client=summon