Fine-grained video super-resolution via spatial-temporal learning and image detail enhancement

This paper addresses the problem for fine-grained video super-resolution (FGVSR) to suppress temporal flickering caused by separately processed consecutive frames and enhance the quality of restored video frame details when upsizing videos. Some existing video SR methods fail to sufficiently utilize...

Full description

Saved in:

Bibliographic Details
Published in	Engineering applications of artificial intelligence Vol. 131; p. 107789
Main Authors	Yeh, Chia-Hung, Yang, Hsin-Fu, Lin, Yu-Yang, Huang, Wan-Jen, Tsai, Feng-Hsu, Kang, Li-Wei
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.05.2024
Subjects	Convolutional neural networks Deep learning Video enhancement Video frame alignment Video reconstruction Video super-resolution Deep learning Video enhancement Video reconstruction Convolutional neural networks Video super-resolution Video frame alignment
Online Access	Get full text
ISSN	0952-1976 1873-6769
DOI	10.1016/j.engappai.2023.107789

Cover

More Information
Summary:	This paper addresses the problem for fine-grained video super-resolution (FGVSR) to suppress temporal flickering caused by separately processed consecutive frames and enhance the quality of restored video frame details when upsizing videos. Some existing video SR methods fail to sufficiently utilize spatial-temporal information from input low-resolution (LR) videos, while others may generate undesirable artifacts or cannot well reconstruct image details. To overcome these problems, we present a novel deep learning framework for FGVSR, which takes a set of consecutive LR video frames and generate the corresponding super-resolved frames. Our deep FGVSR framework focuses on reconstructing missing information from the LR sources based on the proposed multi-frame alignment and refinement strategies. More specifically, we propose an alignment module, where multiple frames are aligned at feature level, to prevent the output videos from flickering. Then, we introduce a feature fusion module, where aligned features generated from our alignment module are fused and refined in a multi-scale manner. Finally, the proposed refinement module is used to reconstruct missing information based on the fused features. In addition, we also embed an image enhancement module on the skip connection from the input layer to the output layer of our network for further enhancing the SR results. Experimental results show that the proposed deep FGVSR, compared with existing deep learning-based VSR methods, achieves state-of-the-art performances on the three well-known benchmarks, including REDS, Vid4, and Vimeo90k. More specifically, compared with the state-of-the-art VSR methods in our experiments, our FGVSR achieves quantitative improvements from 0.70 dB to 9.54 dB in PSNR. On the other hand, our method has also been shown to be efficient to other image restoration tasks, such as image inpainting. •End-to-end trainable deep learning-based fine grained video super-resolution model is proposed.•The temporal dependency is implicitly obtained without performing explicit motion compensation.•Image details are reconstructed and further sharpened by our refinement and enhancement modules.•Adaptively integrated into other deep image restoration model for further boosting performances.
ISSN:	0952-1976 1873-6769
DOI:	10.1016/j.engappai.2023.107789