The Precision and Repeatability of Media Quality Comparisons: Measurements and New Statistical Methods
This paper calculates confidence intervals for 89 datasets that use the 5-level Absolute Category Rating (ACR) method to evaluate the quality of speech, video, images, and video with audio. This data allows us to compute the subjective test confidence interval <inline-formula> <tex-math not...
Saved in:
Published in | IEEE transactions on broadcasting Vol. 69; no. 2; pp. 1 - 18 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.06.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
ISSN | 0018-9316 1557-9611 1557-9611 |
DOI | 10.1109/TBC.2023.3236528 |
Cover
Summary: | This paper calculates confidence intervals for 89 datasets that use the 5-level Absolute Category Rating (ACR) method to evaluate the quality of speech, video, images, and video with audio. This data allows us to compute the subjective test confidence interval <inline-formula> <tex-math notation="LaTeX">(\Delta </tex-math> </inline-formula>SCI) for 5-level ACR tests. We use a confusion matrix to compare conclusions reached by 88 lab-to-lab comparisons, 22 method-to-method comparisons, and 12 comparisons between expert and naïve subjects. We estimate the differences in conclusions reached by ad hoc evaluations, compared to subjective tests. We recommend using the disagree incidence rate to identify lab-to-lab differences (i.e., the likelihood that significantly different stimulus pairs receive opposing rank order from the two labs). Disagree incidence rates above 0.31% are unusual enough to warrant investigation and disagree incidence rates above 1.0% indicate differences in method, test environment, test implementation, or subject demographics. These incidence rates form the basis for a new statistical method that calculates the confidence interval of a metric (<inline-formula> <tex-math notation="LaTeX">\Delta </tex-math> </inline-formula>MCI). When <inline-formula> <tex-math notation="LaTeX">\Delta </tex-math> </inline-formula>MCI is used to make decisions, the equivalence to a video-quality test (EVQT) method determines whether a metric acts similarly to a subjective test. When <inline-formula> <tex-math notation="LaTeX">\Delta </tex-math> </inline-formula>MCI is not used, the metric is likened to a certain number of people in a video-quality test (PVQT). This information will help users make the better decisions when applying quality metrics. The algorithm code is made available for any purpose. Most of the ratings used in this paper come from open datasets. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0018-9316 1557-9611 1557-9611 |
DOI: | 10.1109/TBC.2023.3236528 |