2.9 STEP: An 8K-60fps Space-Time Resolution-Enhancement Neural-Network Processor for Next-Generation Display and Streaming

Next-generation display technology is driving ultra-high-definition (UHD) TVs and screens, offering users an immersive experience. However, the scarcity of 8K-UHD streams and the high cost of transmission bandwidth necessitate the use of ISP techniques on terminal displays to enhance video quality....

Full description

Saved in:

Bibliographic Details
Published in	Digest of technical papers - IEEE International Solid-State Circuits Conference Vol. 68; pp. 1 - 3
Main Authors	Lin, Kai-Ping, Wu, Tong, Lin, Chang-Pao, Chen, Po-Wei, Zhang, Zhi-Jun, Khwa, Win-San, Chang, Meng-Fan, Huang, Chao-Tsung
Format	Conference Proceeding
Language	English
Published	IEEE 16.02.2025
Subjects	Computational modeling Image quality Next generation networking Quality assessment Streaming media Streams Superresolution System-on-chip
Online Access	Get full text
ISSN	2376-8606
DOI	10.1109/ISSCC49661.2025.10904700

Cover

More Information
Summary:	Next-generation display technology is driving ultra-high-definition (UHD) TVs and screens, offering users an immersive experience. However, the scarcity of 8K-UHD streams and the high cost of transmission bandwidth necessitate the use of ISP techniques on terminal displays to enhance video quality. Deep-learning algorithms, in particular, can be employed to render stable and vivid videos. The one-stage space-time video super-resolution (STVSR) algorithm [1], depicted in Fig. 2.9.1, is able to simultaneously generate high-resolution and high-frame-rate videos from low-resolution and low-frame-rate input. But rendering 8K-UHD 60fps videos on edge devices with limited computational resources still poses three main challenges. First, although deeper models typically yield better video quality, resource constraints necessitate the use of shallower CNN models, leading to a compromise in image quality. Second, the deformable convolution with modulation (DCM) [2] effectively aligns images across different time points, and multiple DCM layers further improve image quality. However, they require additional on-chip memory to store feature maps (FM) within the layer-fusion (LF) workflow. Third, a large number of PE arrays (e.g., 10K MACs) are required to achieve high-throughput computation, significantly increasing power consumption.
ISSN:	2376-8606
DOI:	10.1109/ISSCC49661.2025.10904700