HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 2; peer review: 3 approved]

Gene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome In...

Full description

Saved in:
Bibliographic Details
Published inF1000 research Vol. 9; p. 1493
Main Authors Oh, Sehyun, Abdelnabi, Jasmine, Al-Dulaimi, Ragheed, Aggarwal, Ayush, Ramos, Marcel, Davis, Sean, Riester, Markus, Waldron, Levi
Format Journal Article
LanguageEnglish
Published England F1000 Research Limited 2020
F1000 Research Ltd
Subjects
Online AccessGet full text
ISSN2046-1402
2046-1402
DOI10.12688/f1000research.28033.2

Cover

More Information
Summary:Gene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChelper, an R package that identifies known aliases and outdated gene symbols based on the HGNC human and MGI mouse gene symbol databases, in addition to common mislabeling introduced by spreadsheets, and provides corrections where possible. HGNChelper identified invalid gene symbols in the most recent Molecular Signatures Database (MSigDB 7.0) and in platform annotation files of the Gene Expression Omnibus, with prevalence ranging from ~3% in recent platforms to 30-40% in the earliest platforms from 2002-03. HGNChelper is installable from CRAN.
Bibliography:new_version
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
No competing interests were disclosed.
ISSN:2046-1402
2046-1402
DOI:10.12688/f1000research.28033.2