Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence
In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device, and the remaining portion of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel...
        Saved in:
      
    
          | Published in | IEEE open journal of circuits and systems Vol. 2; pp. 350 - 362 | 
|---|---|
| Main Authors | , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        New York
          IEEE
    
        2021
     The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2644-1225 2644-1225  | 
| DOI | 10.1109/OJCAS.2021.3072884 | 
Cover
| Summary: | In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device, and the remaining portion of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN, without requiring any retraining of the network weights. Mathematical models for estimating the clipping and quantization error of leaky-ReLU and ReLU activations at this intermediate layer are used to compute optimal clipping ranges for coarse quantization. A mathematical model for estimating the clipping and quantization error of leaky-ReLU activations at this intermediate layer is developed and used to compute optimal clipping ranges for coarse quantization. We also present a modified entropy-constrained design algorithm for quantizing clipped activations. When applied to popular object-detection and classification DNNs, we were able to compress the 32-bit floating point intermediate activations down to 0.6 to 0.8 bits, while keeping the loss in accuracy to less than 1%. When compared to HEVC, we found that the lightweight codec consistently provided better inference accuracy, by up to 1.3%. The performance and simplicity of this lightweight compression technique makes it an attractive option for coding an intermediate layer of a split neural network for edge/cloud applications. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14  | 
| ISSN: | 2644-1225 2644-1225  | 
| DOI: | 10.1109/OJCAS.2021.3072884 |