Alternatives to the Chi-Square Test for Evaluating Rank Histograms from Ensemble Forecasts

Rank histograms are a commonly used tool for evaluating an ensemble forecasting system’s performance. Because the sample size is finite, the rank histogram is subject to statistical fluctuations, so a goodness-of-fit (GOF) test is employed to determine if the rank histogram is uniform to within some...

Full description

Saved in:
Bibliographic Details
Published inWeather and forecasting Vol. 20; no. 5; pp. 789 - 795
Main Author Elmore, Kimberly L.
Format Journal Article
LanguageEnglish
Published Boston, MA American Meteorological Society 01.10.2005
Subjects
Online AccessGet full text
ISSN0882-8156
1520-0434
1520-0434
DOI10.1175/WAF884.1

Cover

More Information
Summary:Rank histograms are a commonly used tool for evaluating an ensemble forecasting system’s performance. Because the sample size is finite, the rank histogram is subject to statistical fluctuations, so a goodness-of-fit (GOF) test is employed to determine if the rank histogram is uniform to within some statistical certainty. Most often, the χ2 test is used to test whether the rank histogram is indistinguishable from a discrete uniform distribution. However, the χ2 test is insensitive to order and so suffers from troubling deficiencies that may render it unsuitable for rank histogram evaluation. As shown by examples in this paper, more powerful tests, suitable for small sample sizes, and very sensitive to the particular deficiencies that appear in rank histograms are available from the order-dependent Cramér–von Mises family of statistics, in particular, the Watson and Anderson–Darling statistics.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
content type line 23
ISSN:0882-8156
1520-0434
1520-0434
DOI:10.1175/WAF884.1