2.9 STEP: An 8K-60fps Space-Time Resolution-Enhancement Neural-Network Processor for Next-Generation Display and Streaming

Next-generation display technology is driving ultra-high-definition (UHD) TVs and screens, offering users an immersive experience. However, the scarcity of 8K-UHD streams and the high cost of transmission bandwidth necessitate the use of ISP techniques on terminal displays to enhance video quality....

Full description

Saved in:
Bibliographic Details
Published inDigest of technical papers - IEEE International Solid-State Circuits Conference Vol. 68; pp. 1 - 3
Main Authors Lin, Kai-Ping, Wu, Tong, Lin, Chang-Pao, Chen, Po-Wei, Zhang, Zhi-Jun, Khwa, Win-San, Chang, Meng-Fan, Huang, Chao-Tsung
Format Conference Proceeding
LanguageEnglish
Published IEEE 16.02.2025
Subjects
Online AccessGet full text
ISSN2376-8606
DOI10.1109/ISSCC49661.2025.10904700

Cover

More Information
Summary:Next-generation display technology is driving ultra-high-definition (UHD) TVs and screens, offering users an immersive experience. However, the scarcity of 8K-UHD streams and the high cost of transmission bandwidth necessitate the use of ISP techniques on terminal displays to enhance video quality. Deep-learning algorithms, in particular, can be employed to render stable and vivid videos. The one-stage space-time video super-resolution (STVSR) algorithm [1], depicted in Fig. 2.9.1, is able to simultaneously generate high-resolution and high-frame-rate videos from low-resolution and low-frame-rate input. But rendering 8K-UHD 60fps videos on edge devices with limited computational resources still poses three main challenges. First, although deeper models typically yield better video quality, resource constraints necessitate the use of shallower CNN models, leading to a compromise in image quality. Second, the deformable convolution with modulation (DCM) [2] effectively aligns images across different time points, and multiple DCM layers further improve image quality. However, they require additional on-chip memory to store feature maps (FM) within the layer-fusion (LF) workflow. Third, a large number of PE arrays (e.g., 10K MACs) are required to achieve high-throughput computation, significantly increasing power consumption.
ISSN:2376-8606
DOI:10.1109/ISSCC49661.2025.10904700