A Graph-Based Accelerator of Retinex Model With Bit-Serial Computing for Image Enhancements

This work proposes the Poisson equation formulation of the Retinex model for image enhancements using a low-power graph hardware accelerator performing finite difference updates on a lattice graph processing element (PE) array. By encapsulating the underlying algorithm in a graph hardware structure,...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems. I, Regular papers Vol. 72; no. 3; pp. 1346 - 1357
Main Authors Wei, Zhengzhe, Mu, Junjie, Zheng, Yuanjin, Tae-Hyoung Kim, Tony, Kim, Bongjin
Format Journal Article
LanguageEnglish
Published New York IEEE 01.03.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1549-8328
1558-0806
DOI10.1109/TCSI.2025.3525653

Cover

More Information
Summary:This work proposes the Poisson equation formulation of the Retinex model for image enhancements using a low-power graph hardware accelerator performing finite difference updates on a lattice graph processing element (PE) array. By encapsulating the underlying algorithm in a graph hardware structure, a highly localized dataflow that takes advantage of the physical placement of the PEs is enabled to minimize data movement and maximize data reuse. The on-chip dataflow that achieves data sharing, and reuse among neighboring PEs during massively parallel updates is generated in each PE driven by two external control signals. Using a custom accumulator design intended for bit-serial computing, this work enables precision on demand and extensive on-chip data reuse with minimal area overhead, accommodating a non-overlap image mapping scheme in which a <inline-formula> <tex-math notation="LaTeX">20\times 20 </tex-math></inline-formula> image tile can be processed without external memory access at a time. With increasing user-configurable update count, image noise and shadow can be progressively removed with the inevitable loss of image details. Fabricated using a 65nm technology, the test chip occupies 0.2955mm2 core area and consumes 2.191mW operating at 1V, 25.6MHz, and a reconfigurable 10- or 14-bit precision.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1549-8328
1558-0806
DOI:10.1109/TCSI.2025.3525653