The Precision and Repeatability of Media Quality Comparisons: Measurements and New Statistical Methods

This paper calculates confidence intervals for 89 datasets that use the 5-level Absolute Category Rating (ACR) method to evaluate the quality of speech, video, images, and video with audio. This data allows us to compute the subjective test confidence interval <inline-formula> <tex-math not...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on broadcasting Vol. 69; no. 2; pp. 1 - 18
Main Author Pinson, Margaret H.
Format Journal Article
LanguageEnglish
Published New York IEEE 01.06.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN0018-9316
1557-9611
1557-9611
DOI10.1109/TBC.2023.3236528

Cover

More Information
Summary:This paper calculates confidence intervals for 89 datasets that use the 5-level Absolute Category Rating (ACR) method to evaluate the quality of speech, video, images, and video with audio. This data allows us to compute the subjective test confidence interval <inline-formula> <tex-math notation="LaTeX">(\Delta </tex-math> </inline-formula>SCI) for 5-level ACR tests. We use a confusion matrix to compare conclusions reached by 88 lab-to-lab comparisons, 22 method-to-method comparisons, and 12 comparisons between expert and naïve subjects. We estimate the differences in conclusions reached by ad hoc evaluations, compared to subjective tests. We recommend using the disagree incidence rate to identify lab-to-lab differences (i.e., the likelihood that significantly different stimulus pairs receive opposing rank order from the two labs). Disagree incidence rates above 0.31% are unusual enough to warrant investigation and disagree incidence rates above 1.0% indicate differences in method, test environment, test implementation, or subject demographics. These incidence rates form the basis for a new statistical method that calculates the confidence interval of a metric (<inline-formula> <tex-math notation="LaTeX">\Delta </tex-math> </inline-formula>MCI). When <inline-formula> <tex-math notation="LaTeX">\Delta </tex-math> </inline-formula>MCI is used to make decisions, the equivalence to a video-quality test (EVQT) method determines whether a metric acts similarly to a subjective test. When <inline-formula> <tex-math notation="LaTeX">\Delta </tex-math> </inline-formula>MCI is not used, the metric is likened to a certain number of people in a video-quality test (PVQT). This information will help users make the better decisions when applying quality metrics. The algorithm code is made available for any purpose. Most of the ratings used in this paper come from open datasets.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0018-9316
1557-9611
1557-9611
DOI:10.1109/TBC.2023.3236528