A Python package based on robust statistical analysis for serial crystallography data processing

The term robustness in statistics refers to methods that are generally insensitive to deviations from model assumptions. In other words, robust methods are able to preserve their accuracy even when the data do not perfectly fit the statistical models. Robust statistical analyses are particularly eff...

Full description

Saved in:
Bibliographic Details
Published inActa crystallographica. Section D, Biological crystallography. Vol. 79; no. 9; pp. 820 - 829
Main Authors Hadian-Jazi, Marjan, Sadri, Alireza
Format Journal Article
LanguageEnglish
Published 5 Abbey Square, Chester, Cheshire CH1 2HU, England International Union of Crystallography 01.09.2023
Wiley Subscription Services, Inc
Subjects
Online AccessGet full text
ISSN2059-7983
0907-4449
2059-7983
1399-0047
DOI10.1107/S2059798323005855

Cover

More Information
Summary:The term robustness in statistics refers to methods that are generally insensitive to deviations from model assumptions. In other words, robust methods are able to preserve their accuracy even when the data do not perfectly fit the statistical models. Robust statistical analyses are particularly effective when analysing mixtures of probability distributions. Therefore, these methods enable the discretization of X‐ray serial crystallography data into two probability distributions: a group comprising true data points (for example the background intensities) and another group comprising outliers (for example Bragg peaks or bad pixels on an X‐ray detector). These characteristics of robust statistical analysis are beneficial for the ever‐increasing volume of serial crystallography (SX) data sets produced at synchrotron and X‐ray free‐electron laser (XFEL) sources. The key advantage of the use of robust statistics for some applications in SX data analysis is that it requires minimal parameter tuning because of its insensitivity to the input parameters. In this paper, a software package called Robust Gaussian Fitting library (RGFlib) is introduced that is based on the concept of robust statistics. Two methods are presented based on the concept of robust statistics and RGFlib for two SX data‐analysis tasks: (i) a robust peak‐finding algorithm and (ii) an automated robust method to detect bad pixels on X‐ray pixel detectors. This article introduces RGFlib, a Python package for robust statistical analysis. The package is a useful tool for a variety of tasks in X‐ray crystallography data analysis, such as peak‐finding, bad pixel mask making and other outlier‐detection tasks.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2059-7983
0907-4449
2059-7983
1399-0047
DOI:10.1107/S2059798323005855