No-Reference Video Quality Assessment Using Local Structural and Quality-Aware Deep Features

Due to the growing demand for high-quality video services in 4G and 5G applications, measuring the quantitative quality of video services is expected to become a major vital task. The no-reference video quality assessment (NR-VQA) work published so far regresses computationally complex statistical t...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on instrumentation and measurement Vol. 72; pp. 1 - 12
Main Authors	Vishwakarma, Anish Kumar, Bhurchandi, Kishor M.
Format	Journal Article
Language	English
Published	New York IEEE 2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Artificial neural networks Computational modeling Deep quality-aware features Distortion Feature extraction Frames (data processing) Frequencies high-frequency bands local spatiotemporal tetra patterns (LSTP) no-reference video quality assessment (NR-VQA) Quality assessment Spatiotemporal phenomena Statistical analysis Streaming media structural distortion Subjective assessment Support vector machines support vector regression (SVR)
Online Access	Get full text
ISSN	0018-9456 1557-9662
DOI	10.1109/TIM.2023.3273654

Cover

More Information
Summary:	Due to the growing demand for high-quality video services in 4G and 5G applications, measuring the quantitative quality of video services is expected to become a major vital task. The no-reference video quality assessment (NR-VQA) work published so far regresses computationally complex statistical transforms or convolutional neural network (CNN) features to predict a quality score. In this article, we propose a novel NR-VQA scheme using systematic sampling of spatiotemporal planes (<inline-formula> <tex-math notation="LaTeX">XY </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">XT </tex-math></inline-formula>, and <inline-formula> <tex-math notation="LaTeX">YT </tex-math></inline-formula>) based on the high standard deviation (<inline-formula> <tex-math notation="LaTeX">\sigma </tex-math></inline-formula>) of their high-frequency bands to represent distortion. The human visual system (HVS) is highly sensitive to structural information in visual scenes, and distortions disrupt the structural properties. The proposed scheme encodes two-level, 3-D structural video information using novel local spatiotemporal tetra patterns (LSTP) on the sampled highest <inline-formula> <tex-math notation="LaTeX">\sigma </tex-math></inline-formula> planes from each block of planes. Besides, we extract quality-aware deep features from the second highest <inline-formula> <tex-math notation="LaTeX">\sigma </tex-math></inline-formula> sampled video frames (<inline-formula> <tex-math notation="LaTeX">XY </tex-math></inline-formula>-spatial) from each block using a fine-tuned CNN model. The extracted LSTP and deep quality-aware features of the two highest <inline-formula> <tex-math notation="LaTeX">\sigma </tex-math></inline-formula> frames are average pooled and concatenated with the top <inline-formula> <tex-math notation="LaTeX">100~\sigma </tex-math></inline-formula> values of other frames to form video-level final features. Finally, the concatenated features are fed to a support vector regression (SVR) to predict the perceptual quality scores of test videos. The proposed method is evaluated on ten publicly available standard exhaustive video quality assessment (VQA) databases containing synthetic, authentic, and mixed distortions. Comprehensive, robust, and extensive experiments indicate that the proposed model outperforms all the state-of-the-art VQA models and is consistent with human subjective assessment.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9456 1557-9662
DOI:	10.1109/TIM.2023.3273654