ParaMed: a parallel corpus for English–Chinese translation in the biomedical domain
Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations. Machine translation represents an effective alternative, but accurate machine tr...
Saved in:
Published in | BMC medical informatics and decision making Vol. 21; no. 1; pp. 258 - 11 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
England
BioMed Central
06.09.2021
BMC |
Subjects | |
Online Access | Get full text |
ISSN | 1472-6947 1472-6947 |
DOI | 10.1186/s12911-021-01621-8 |
Cover
Abstract | Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations. Machine translation represents an effective alternative, but accurate machine translation requires large amounts of in-domain data. While such datasets are abundant in general domains, they are less accessible in the biomedical domain. Chinese and English are two of the most widely spoken languages, yet to our knowledge, a parallel corpus does not exist for this language pair in the biomedical domain.
We developed an effective pipeline to acquire and process an English-Chinese parallel corpus from the New England Journal of Medicine (NEJM). This corpus consists of about 100,000 sentence pairs and 3,000,000 tokens on each side. We showed that training on out-of-domain data and fine-tuning with as few as 4000 NEJM sentence pairs improve translation quality by 25.3 (13.4) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions. Translation quality continues to improve at a slower pace on larger in-domain data subsets, with a total increase of 33.0 (24.3) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions on the full dataset.
The code and data are available at https://github.com/boxiangliu/ParaMed . |
---|---|
AbstractList | Abstract Background Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations. Machine translation represents an effective alternative, but accurate machine translation requires large amounts of in-domain data. While such datasets are abundant in general domains, they are less accessible in the biomedical domain. Chinese and English are two of the most widely spoken languages, yet to our knowledge, a parallel corpus does not exist for this language pair in the biomedical domain. Description We developed an effective pipeline to acquire and process an English-Chinese parallel corpus from the New England Journal of Medicine (NEJM). This corpus consists of about 100,000 sentence pairs and 3,000,000 tokens on each side. We showed that training on out-of-domain data and fine-tuning with as few as 4000 NEJM sentence pairs improve translation quality by 25.3 (13.4) BLEU for en $$\rightarrow$$ → zh (zh $$\rightarrow$$ → en) directions. Translation quality continues to improve at a slower pace on larger in-domain data subsets, with a total increase of 33.0 (24.3) BLEU for en $$\rightarrow$$ → zh (zh $$\rightarrow$$ → en) directions on the full dataset. Conclusions The code and data are available at https://github.com/boxiangliu/ParaMed . Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations. Machine translation represents an effective alternative, but accurate machine translation requires large amounts of in-domain data. While such datasets are abundant in general domains, they are less accessible in the biomedical domain. Chinese and English are two of the most widely spoken languages, yet to our knowledge, a parallel corpus does not exist for this language pair in the biomedical domain. We developed an effective pipeline to acquire and process an English-Chinese parallel corpus from the New England Journal of Medicine (NEJM). This corpus consists of about 100,000 sentence pairs and 3,000,000 tokens on each side. We showed that training on out-of-domain data and fine-tuning with as few as 4000 NEJM sentence pairs improve translation quality by 25.3 (13.4) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions. Translation quality continues to improve at a slower pace on larger in-domain data subsets, with a total increase of 33.0 (24.3) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions on the full dataset. The code and data are available at https://github.com/boxiangliu/ParaMed . Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations. Machine translation represents an effective alternative, but accurate machine translation requires large amounts of in-domain data. While such datasets are abundant in general domains, they are less accessible in the biomedical domain. Chinese and English are two of the most widely spoken languages, yet to our knowledge, a parallel corpus does not exist for this language pair in the biomedical domain.BACKGROUNDBiomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations. Machine translation represents an effective alternative, but accurate machine translation requires large amounts of in-domain data. While such datasets are abundant in general domains, they are less accessible in the biomedical domain. Chinese and English are two of the most widely spoken languages, yet to our knowledge, a parallel corpus does not exist for this language pair in the biomedical domain.We developed an effective pipeline to acquire and process an English-Chinese parallel corpus from the New England Journal of Medicine (NEJM). This corpus consists of about 100,000 sentence pairs and 3,000,000 tokens on each side. We showed that training on out-of-domain data and fine-tuning with as few as 4000 NEJM sentence pairs improve translation quality by 25.3 (13.4) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions. Translation quality continues to improve at a slower pace on larger in-domain data subsets, with a total increase of 33.0 (24.3) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions on the full dataset.DESCRIPTIONWe developed an effective pipeline to acquire and process an English-Chinese parallel corpus from the New England Journal of Medicine (NEJM). This corpus consists of about 100,000 sentence pairs and 3,000,000 tokens on each side. We showed that training on out-of-domain data and fine-tuning with as few as 4000 NEJM sentence pairs improve translation quality by 25.3 (13.4) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions. Translation quality continues to improve at a slower pace on larger in-domain data subsets, with a total increase of 33.0 (24.3) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions on the full dataset.The code and data are available at https://github.com/boxiangliu/ParaMed .CONCLUSIONSThe code and data are available at https://github.com/boxiangliu/ParaMed . Background Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations. Machine translation represents an effective alternative, but accurate machine translation requires large amounts of in-domain data. While such datasets are abundant in general domains, they are less accessible in the biomedical domain. Chinese and English are two of the most widely spoken languages, yet to our knowledge, a parallel corpus does not exist for this language pair in the biomedical domain. Description We developed an effective pipeline to acquire and process an English-Chinese parallel corpus from the New England Journal of Medicine (NEJM). This corpus consists of about 100,000 sentence pairs and 3,000,000 tokens on each side. We showed that training on out-of-domain data and fine-tuning with as few as 4000 NEJM sentence pairs improve translation quality by 25.3 (13.4) BLEU for en\(\rightarrow\)zh (zh\(\rightarrow\)en) directions. Translation quality continues to improve at a slower pace on larger in-domain data subsets, with a total increase of 33.0 (24.3) BLEU for en\(\rightarrow\)zh (zh\(\rightarrow\)en) directions on the full dataset. Conclusions The code and data are available at https://github.com/boxiangliu/ParaMed. |
ArticleNumber | 258 |
Author | Liu, Boxiang Huang, Liang |
Author_xml | – sequence: 1 givenname: Boxiang orcidid: 0000-0002-2595-4463 surname: Liu fullname: Liu, Boxiang – sequence: 2 givenname: Liang surname: Huang fullname: Huang, Liang |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/34488734$$D View this record in MEDLINE/PubMed |
BookMark | eNp9Uk1vEzEQtVARbQN_gANaiQuXwI7X8QcHJBQVqFQEB3q2vPZs4sixg72LxI3_wD_sL6mblKrtgYPtkefN07yZd0qOYopIyEto3wJI_q4AVQDzltYDvN7yCTkBJuicKyaO7sXH5LSUTduCkN3iGTnuGJNSdOyEXH432XxF974xza6GIWBobMq7qTRDys1ZXAVf1ld__i7XPmLBZswmlmBGn2LjYzOusel92qLz1oTGpa3x8Tl5OphQ8MXtOyOXn85-LL_ML759Pl9-vJhbpvg4l6rHtqescxTUIFFZND3SHnCwznAxIFOuaugN7xkCt2qhasa5dqHAcdvNyPmB1yWz0bvstyb_1sl4vf9IeaVNHr0NqNEIOgjDXCcFg84pKZ2yXd8jGAVdV7k-HLh2U1_VWIxVaXhA-jAT_Vqv0i8tGaWc80rw5pYgp58TllFvfbEYgomYpqLpQrQAIAAq9PUj6CZNOdZR3aAYE0LWqczIq_sd3bXyb3sVQA8Am1MpGYc7CLT6xiL6YBFdLaL3FtGyFslHRdaP-3VWVT78r_QabHfCLQ |
CitedBy_id | crossref_primary_10_3390_electronics13071381 crossref_primary_10_1093_llc_fqac089 crossref_primary_10_1145_3626095 crossref_primary_10_3390_jpm14090923 crossref_primary_10_3390_app12126002 crossref_primary_10_3390_app14167088 crossref_primary_10_1016_j_csl_2023_101582 crossref_primary_10_2478_amns_2025_0565 |
Cites_doi | 10.18653/v1/W17-2507 10.18653/v1/P16-1162 10.18653/v1/W18-6478 10.1136/bmj.316.7124.2a 10.18653/v1/P16-1009 10.1075/cilt.292.32var 10.18653/v1/W16-2301 10.18653/v1/2020.emnlp-main.6 10.1007/3-540-45820-4_14 10.18653/v1/W18-6453 10.18653/v1/W16-2369 10.18653/v1/W16-2347 10.18653/v1/P17-4012 10.18653/v1/W18-6401 10.18653/v1/W17-4717 10.3115/1557769.1557821 10.1162/neco.1997.9.8.1735 10.1093/nar/gkh061 10.1136/bmj.b2354 10.18653/v1/W19-5301 10.18653/v1/W18-6488 |
ContentType | Journal Article |
Copyright | 2021. The Author(s). 2021. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. The Author(s) 2021 |
Copyright_xml | – notice: 2021. The Author(s). – notice: 2021. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: The Author(s) 2021 |
DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 3V. 7QO 7SC 7X7 7XB 88C 88E 8AL 8FD 8FE 8FG 8FH 8FI 8FJ 8FK ABUWG AFKRA ARAPS AZQEC BBNVY BENPR BGLVJ BHPHI CCPQU DWQXO FR3 FYUFA GHDGH GNUQQ HCIFZ JQ2 K7- K9. L7M LK8 L~C L~D M0N M0S M0T M1P M7P P5Z P62 P64 PHGZM PHGZT PIMPY PJZUB PKEHL PPXIY PQEST PQGLB PQQKQ PQUKI PRINS Q9U 7X8 5PM DOA |
DOI | 10.1186/s12911-021-01621-8 |
DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed ProQuest Central (Corporate) Biotechnology Research Abstracts Computer and Information Systems Abstracts ProQuest Health & Medical Collection ProQuest Central (purchase pre-March 2016) Healthcare Administration Database (Alumni) Medical Database (Alumni Edition) Computing Database (Alumni Edition) Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Natural Science Collection Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials - QC Biological Science Collection ProQuest Central Technology Collection (via ProQuest SciTech Premium Collection) Natural Science Collection ProQuest One Community College ProQuest Central Engineering Research Database Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database ProQuest Health & Medical Complete (Alumni) Advanced Technologies Database with Aerospace Biological Sciences Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Computing Database ProQuest Health & Medical Collection Healthcare Administration Database PML(ProQuest Medical Library) ProQuest Biological Science Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection Biotechnology and BioEngineering Abstracts ProQuest Central Premium ProQuest One Academic (New) ProQuest Publicly Available Content Database ProQuest Health & Medical Research Collection ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest Central Basic MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Publicly Available Content Database Computer Science Database ProQuest Central Student ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts SciTech Premium Collection ProQuest Central China ProQuest One Applied & Life Sciences Health Research Premium Collection Natural Science Collection Health & Medical Research Collection Biological Science Collection ProQuest Central (New) ProQuest Medical Library (Alumni) Advanced Technologies & Aerospace Collection ProQuest Biological Science Collection ProQuest One Academic Eastern Edition ProQuest Hospital Collection ProQuest Technology Collection Health Research Premium Collection (Alumni) Biological Science Database ProQuest Hospital Collection (Alumni) Biotechnology and BioEngineering Abstracts ProQuest Health & Medical Complete ProQuest One Academic UKI Edition ProQuest Health Management (Alumni Edition) Engineering Research Database ProQuest One Academic ProQuest One Academic (New) Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest One Health & Nursing ProQuest Natural Science Collection ProQuest Central ProQuest Health & Medical Research Collection Biotechnology Research Abstracts Health and Medicine Complete (Alumni Edition) ProQuest Central Korea Advanced Technologies Database with Aerospace ProQuest Computing ProQuest Central Basic ProQuest Computing (Alumni Edition) ProQuest Health Management ProQuest SciTech Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest Medical Library ProQuest Central (Alumni) MEDLINE - Academic |
DatabaseTitleList | MEDLINE MEDLINE - Academic Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 4 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Medicine |
EISSN | 1472-6947 |
EndPage | 11 |
ExternalDocumentID | oai_doaj_org_article_ea72f7a4d387413d988d9c3bbe1a9133 PMC8422666 34488734 10_1186_s12911_021_01621_8 |
Genre | Journal Article |
GeographicLocations | China |
GeographicLocations_xml | – name: China |
GroupedDBID | --- 0R~ 23N 2WC 53G 5VS 6J9 6PF 7X7 88E 8FE 8FG 8FH 8FI 8FJ AAFWJ AAJSJ AAKPC AASML AAWTL AAYXX ABDBF ABUWG ACGFO ACGFS ACIWK ACPRK ACUHS ADBBV ADUKV AENEX AFKRA AFPKN AFRAH AHBYD AHMBA AHYZX ALIPV ALMA_UNASSIGNED_HOLDINGS AMKLP AMTXH AOIJS AQUVI ARAPS AZQEC BAPOH BAWUL BBNVY BCNDV BENPR BFQNJ BGLVJ BHPHI BMC BPHCQ BVXVI C6C CCPQU CITATION CS3 DIK DU5 DWQXO E3Z EAD EAP EAS EBD EBLON EBS EMB EMK EMOBN ESX F5P FYUFA GNUQQ GROUPED_DOAJ GX1 HCIFZ HMCUK HYE IAO IHR INH INR ITC K6V K7- KQ8 LK8 M0T M1P M48 M7P M~E O5R O5S OK1 OVT P2P P62 PGMZT PHGZM PHGZT PIMPY PQQKQ PROAC PSQYO RBZ RNS ROL RPM RSV SMD SOJ SV3 TR2 TUS UKHRP W2D WOQ WOW XSB CGR CUY CVF ECM EIF NPM PJZUB PPXIY PQGLB 3V. 7QO 7SC 7XB 8AL 8FD 8FK FR3 JQ2 K9. L7M L~C L~D M0N P64 PKEHL PQEST PQUKI PRINS Q9U 7X8 PUEGO 5PM |
ID | FETCH-LOGICAL-c496t-89be0b243d219f8e9ceabe2b1efcda67fe49d472ba6b4e16c959fcddd0591d6c3 |
IEDL.DBID | BENPR |
ISSN | 1472-6947 |
IngestDate | Wed Aug 27 01:17:27 EDT 2025 Thu Aug 21 14:28:57 EDT 2025 Thu Sep 04 18:55:43 EDT 2025 Fri Jul 25 19:04:00 EDT 2025 Mon Jul 21 06:00:43 EDT 2025 Tue Jul 01 04:05:53 EDT 2025 Thu Apr 24 23:07:03 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Keywords | Text mining Machine translation Natural language processing |
Language | English |
License | 2021. The Author(s). Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c496t-89be0b243d219f8e9ceabe2b1efcda67fe49d472ba6b4e16c959fcddd0591d6c3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ORCID | 0000-0002-2595-4463 |
OpenAccessLink | https://www.proquest.com/docview/2574477824?pq-origsite=%requestingapplication%&accountid=15518 |
PMID | 34488734 |
PQID | 2574477824 |
PQPubID | 42572 |
PageCount | 11 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_ea72f7a4d387413d988d9c3bbe1a9133 pubmedcentral_primary_oai_pubmedcentral_nih_gov_8422666 proquest_miscellaneous_2570111711 proquest_journals_2574477824 pubmed_primary_34488734 crossref_primary_10_1186_s12911_021_01621_8 crossref_citationtrail_10_1186_s12911_021_01621_8 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2021-09-06 |
PublicationDateYYYYMMDD | 2021-09-06 |
PublicationDate_xml | – month: 09 year: 2021 text: 2021-09-06 day: 06 |
PublicationDecade | 2020 |
PublicationPlace | England |
PublicationPlace_xml | – name: England – name: London |
PublicationTitle | BMC medical informatics and decision making |
PublicationTitleAlternate | BMC Med Inform Decis Mak |
PublicationYear | 2021 |
Publisher | BioMed Central BMC |
Publisher_xml | – name: BioMed Central – name: BMC |
References | 1621_CR31 1621_CR10 1621_CR32 1621_CR11 1621_CR33 1621_CR34 1621_CR13 1621_CR14 WA Gale (1621_CR20) 1993; 19 S Hochreiter (1621_CR35) 1997; 9 1621_CR36 1621_CR15 1621_CR37 1621_CR16 1621_CR38 1621_CR30 1621_CR5 1621_CR3 1621_CR9 J Tiedemann (1621_CR12) 2012; 2012 1621_CR8 1621_CR7 1621_CR6 1621_CR17 1621_CR39 1621_CR18 1621_CR42 1621_CR21 O Bodenreider (1621_CR4) 2004; 32 1621_CR22 1621_CR23 1621_CR24 1621_CR25 A Das (1621_CR2) 2009; 338 1621_CR26 1621_CR27 S Bird (1621_CR19) 2009 1621_CR40 I Bamforth (1621_CR1) 1998; 316 1621_CR41 1621_CR28 1621_CR29 |
References_xml | – ident: 1621_CR8 doi: 10.18653/v1/W17-2507 – ident: 1621_CR37 doi: 10.18653/v1/P16-1162 – ident: 1621_CR27 doi: 10.18653/v1/W18-6478 – ident: 1621_CR24 – ident: 1621_CR6 – volume: 2012 start-page: 2214 year: 2012 ident: 1621_CR12 publication-title: LREC – volume: 316 start-page: 2 issue: 7124 year: 1998 ident: 1621_CR1 publication-title: BMJ doi: 10.1136/bmj.316.7124.2a – ident: 1621_CR28 – ident: 1621_CR5 doi: 10.18653/v1/P16-1009 – ident: 1621_CR22 doi: 10.1075/cilt.292.32var – ident: 1621_CR41 doi: 10.18653/v1/W16-2301 – ident: 1621_CR40 doi: 10.18653/v1/2020.emnlp-main.6 – volume-title: Natural language processing with Python year: 2009 ident: 1621_CR19 – ident: 1621_CR21 doi: 10.1007/3-540-45820-4_14 – ident: 1621_CR13 – ident: 1621_CR26 doi: 10.18653/v1/W18-6453 – ident: 1621_CR32 – ident: 1621_CR9 – ident: 1621_CR30 – ident: 1621_CR11 – ident: 1621_CR16 doi: 10.18653/v1/W16-2369 – ident: 1621_CR17 – ident: 1621_CR38 – ident: 1621_CR15 doi: 10.18653/v1/W16-2347 – ident: 1621_CR3 – ident: 1621_CR23 – ident: 1621_CR34 doi: 10.18653/v1/P17-4012 – ident: 1621_CR36 doi: 10.18653/v1/W18-6401 – ident: 1621_CR25 – ident: 1621_CR42 doi: 10.18653/v1/W17-4717 – ident: 1621_CR7 – ident: 1621_CR29 doi: 10.3115/1557769.1557821 – volume: 19 start-page: 75 issue: 1 year: 1993 ident: 1621_CR20 publication-title: Comput Linguist – volume: 9 start-page: 1735 issue: 8 year: 1997 ident: 1621_CR35 publication-title: Neural Comput doi: 10.1162/neco.1997.9.8.1735 – ident: 1621_CR10 – ident: 1621_CR33 – volume: 32 start-page: 267 issue: suppl–1 year: 2004 ident: 1621_CR4 publication-title: Nucleic Acids Res doi: 10.1093/nar/gkh061 – ident: 1621_CR31 – volume: 338 start-page: 2354 year: 2009 ident: 1621_CR2 publication-title: BMJ doi: 10.1136/bmj.b2354 – ident: 1621_CR14 doi: 10.18653/v1/W19-5301 – ident: 1621_CR18 – ident: 1621_CR39 doi: 10.18653/v1/W18-6488 |
SSID | ssj0017835 |
Score | 2.3449736 |
Snippet | Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified... Background Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train... Abstract Background Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging... |
SourceID | doaj pubmedcentral proquest pubmed crossref |
SourceType | Open Website Open Access Repository Aggregation Database Index Database Enrichment Source |
StartPage | 258 |
SubjectTerms | Algorithms Bilingualism China Clinical trials Datasets Domains Editorials English language Health informatics Humans Interpreters Language Language translation Machine translation Natural Language Processing Text mining Translating Translation Translations Translators Websites |
SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT-MwELYQhxUXtA92CS95JW4ooo5dx-YGCFStVMSBStwsPyaiUjdFtL3zH_iH_JKdSdKKrlZw2UukxI7ijMcz39jzYOwYdaLSfXKrUD2ZK5V8bsCTs782oUpBpEgb-sMbPRipX_f9-zelvsgnrE0P3BLuFHxZVKVXSRpUfjJZY5KNMgQQ3qKBRdK3Z3tLY6o7P6D9jGWIjNGnM9RqtBVYkOms8WrW1FCTrf9fEPNvT8k3quf6M9vuMCM_b8f6hW1A_ZV9Gnan4t_Y6NY_ebw9455TKu_JBCYcrcrHxYwjJuVdqO7r8wtVy4YZ8DlpqNYLjo9rjiCQt3H4NGU8TX_7cb3DRtdXd5eDvCuXkEdl9Tw3NkAvFEomlEKVARvBByiCgComr8sKlE2qLILXQYHQ0fYttqSECEskHeV3tllPa9hlnCA30dhTOZjCSyoVK4OPSaYKv2AzJpbUc7HLJU4lLSausSmMdi3FHVLcNRR3JmMnq3ce20wa7_a-oElZ9aQs2M0D5A3X8Yb7iDcydrCcUtctzZlDGaVUicBIZeznqhkXFZ2U-Bqmi6YPyj1RCpGxHy0HrEYi0aA1pcS3yzXeWBvqeks9fmgSdxsKW9Z673_82z7bKhp-piQSB2xz_rSAQ8RH83DULIU_pSoOLw priority: 102 providerName: Directory of Open Access Journals – databaseName: Scholars Portal Journals: Open Access dbid: M48 link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3di9QwEB_OE8QX8dvqKRF8k-qlyaaJIKLicQgnPrhwbyFf1YXaPfcD7t78H_wP_UucSdvVleVeCm0SOkwmmd9kJjMAz1AnSjWhsAp5KEopoyt1chTsr7Rvoucx0IH-ySd1PJUfTyenezCWOxoYuNxp2lE9qemifXH-4-INLvjXecFr9XKJOosO-ioyjBU-9RW4mv1FFMon_3oV6JRjvDizc9yWcso5_HcBz__jJ_9RSEc34caAJNnbfupvwV7qbsO1k8FXfgemn93C4esr5hgl-G7b1DK0Nc_WS4ZIlQ0XeH___EU1tNMysRXprT42js06htCQ9bfzaSJZnH93s-4uTI8-fHl_XA5FFMogjVqV2vh06CspIu5NjU4mJOdT5XlqQnSqbpI0UdaVd8rLxFUwE4MtMSLu4lEFcQ_2u3mXHgAjIB6Ex55oFlZOUAFZ4V2IIjb4B1MAH7lnw5BhnApdtDZbGlrZnuMWOW4zx60u4PlmzFmfX-PS3u9oUjY9KTd2_jBffLXDUrPJ1VVTOxmFRrgkotE6ZroTdwZN8gIOxim1o7xZ3LmkrBEuyQKebppxqZH_xHVpvs59cDfkNecF3O8lYEOJQDNX1wJH11uysUXqdks3-5bTeWu6zKzUw8vJegTXqyyplDTiAPZXi3V6jHho5Z9kIf8Df5oIqw priority: 102 providerName: Scholars Portal |
Title | ParaMed: a parallel corpus for English–Chinese translation in the biomedical domain |
URI | https://www.ncbi.nlm.nih.gov/pubmed/34488734 https://www.proquest.com/docview/2574477824 https://www.proquest.com/docview/2570111711 https://pubmed.ncbi.nlm.nih.gov/PMC8422666 https://doaj.org/article/ea72f7a4d387413d988d9c3bbe1a9133 |
Volume | 21 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfR3bihMxNOx2QXwR73ZdSwTfZNjOJE0ygoiVrYvQsiwWii8ht9FCnam9vO8_-Id-iedkLlqRfTkwkzNMSE7OLedCyCuQiVyMMKyCD1nCuTeJCgaD_YWyhbepd-jQn87E5Zx_WowWR2TW5sJgWGXLEyOj9pVDH_k5kBbnEuQZf7f-kWDXKLxdbVtomKa1gn8bS4wdkxNgyaNhj5yML2ZX1929Avo52tQZJc63IO3QRZihSS0AqgPxFKv4_0_1_DeC8i-RNLlP7jW6JH1fb_4DchTKh-TOtLktf0TmV2Zj4PENNRRLfK9WYUXB2lzvtxR0Vdqk8P66-YldtMM20B1Krjo6ji5LCsohrfPzcSupr76bZfmYzCcXnz9cJk0bhcTxXOwSldswtBlnHrhToULugrEhs2konDdCFoHnnsvMGmF5SIXLRzmMeA-aV-qFY09Ir6zK8IxQVMUds4AJhmFmGLaQZdY4z3wBf8j7JG1XT7umxji2uljpaGsooesV17DiOq64Vn3yuvtmXVfYuBV7jJvSYWJ17Pii2nzVzWHTwciskIZ7pkBhYj5Xysd5h9TkYJT3yVm7pbo5slv9h8D65GU3DIcNb1BMGap9xAF-mMo07ZOnNQV0M2Fg6CrJ4Gt5QBsHUz0cKZffYkFvhenMQpzePq3n5G4WKRXLRpyR3m6zDy9AI9rZATmWCwlQTT4OGpIfRO8CwClXAK_HX34DficQzw |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtR3LbtQwcFS2EnBBvAkUMBKcUNQm9joOUoUotNrS7qpCXak3144dWGlJln0IceMf-B8-hi9hJi9YhHrrJVJiJxmNx573DMBz5IlC9imsQuzwUAhnQuUNBftLZXNnI5eRQX84koOxeH_WP9uAn20uDIVVtmdidVC7MiMb-TaSlhAJ8jPxevYlpK5R5F1tW2iYprWC261KjDWJHUf-21dU4Ra7h-9wvV_E8cH-6dtB2HQZCDORymWoUut3bCy4w82bK59m3lgf28jnmTMyyb1InUhia6QVPpJZ2k9xxDkUTCInM47fvQKbggwoPdjc2x-dfOj8GGRXaVN1lNxeIHclk2RMKrzEq1pjh1XXgP-Juv9GbP7FAg9uwo1GdmVvamK7BRu-uA1Xh413_g6MT8zc4O0rZhiVFJ9O_ZShdjtbLRjKxqxJGf71_Qd17fYLz5bEKetoPDYpGAqjrK4HQKTDXPnZTIq7ML4UhN6DXlEW_gEwEv0zbnEmKqKx4dSylluTOe5y_EMaQNRiT2dNTXNqrTHVlW6jpK4xrhHjusK4VgG87N6Z1RU9Lpy9R4vSzaRq3NWDcv5RN5tbe5PEeWKE4woFNO5SpVwFt49MGnEewFa7pLo5Ihb6D0EH8Kwbxs1NHhtT-HJVzcHzN0qiKID7NQV0kHBUrFXC8e1kjTbWQF0fKSafqgLiitKnpXx4MVhP4drgdHisjw9HR4_gelxRLZWs2ILecr7yj1EaW9onDckzOL_sXfYby4lL_Q |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ParaMed%3A+a+parallel+corpus+for+English%E2%80%93Chinese+translation+in+the+biomedical+domain&rft.jtitle=BMC+medical+informatics+and+decision+making&rft.au=Liu%2C+Boxiang&rft.au=Huang%2C+Liang&rft.date=2021-09-06&rft.pub=BioMed+Central&rft.eissn=1472-6947&rft.volume=21&rft.spage=1&rft_id=info:doi/10.1186%2Fs12911-021-01621-8 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1472-6947&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1472-6947&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1472-6947&client=summon |