A globally synthesised and flagged bee occurrence dataset and cleaning workflow

Species occurrence data are foundational for research, conservation, and science communication, but the limited availability and accessibility of reliable data represents a major obstacle, particularly for insects, which face mounting pressures. We present BeeBDC , a new R package, and a global bee...

Full description

Saved in:
Bibliographic Details
Published inScientific data Vol. 10; no. 1; p. 747
Main Authors Dorey, James B., Fischer, Erica E., Chesshire, Paige R., Nava-Bolaños, Angela, O’Reilly, Robert L., Bossert, Silas, Collins, Shannon M., Lichtenberg, Elinor M., Tucker, Erika M., Smith-Pardo, Allan, Falcon-Brindis, Armando, Guevara, Diego A., Ribeiro, Bruno, de Pedro, Diego, Pickering, John, Hung, Keng-Lou James, Parys, Katherine A., McCabe, Lindsie M., Rogan, Matthew S., Minckley, Robert L., Velazco, Santiago J. E., Griswold, Terry, Zarrillo, Tracy A., Jetz, Walter, Sica, Yanina V., Orr, Michael C., Guzman, Laura Melissa, Ascher, John S., Hughes, Alice C., Cobb, Neil S.
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 02.11.2023
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text
ISSN2052-4463
2052-4463
DOI10.1038/s41597-023-02626-w

Cover

More Information
Summary:Species occurrence data are foundational for research, conservation, and science communication, but the limited availability and accessibility of reliable data represents a major obstacle, particularly for insects, which face mounting pressures. We present BeeBDC , a new R package, and a global bee occurrence dataset to address this issue. We combined >18.3 million bee occurrence records from multiple public repositories (GBIF, SCAN, iDigBio, USGS, ALA) and smaller datasets, then standardised, flagged, deduplicated, and cleaned the data using the reproducible BeeBDC R -workflow. Specifically, we harmonised species names (following established global taxonomy), country names, and collection dates and, we added record-level flags for a series of potential quality issues. These data are provided in two formats, “cleaned” and “flagged-but-uncleaned”. The BeeBDC package with online documentation provides end users the ability to modify filtering parameters to address their research questions. By publishing reproducible R workflows and globally cleaned datasets, we can increase the accessibility and reliability of downstream analyses. This workflow can be implemented for other taxa to support research and conservation.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Undefined-1
ObjectType-Feature-3
content type line 23
ISSN:2052-4463
2052-4463
DOI:10.1038/s41597-023-02626-w