Multimodal deep learning using on-chip diffractive optics with in situ training capability

Multimodal deep learning plays a pivotal role in supporting the processing and learning of diverse data types within the realm of artificial intelligence generated content (AIGC). However, most photonic neuromorphic processors for deep learning can only handle a single data modality (either vision o...

Full description

Saved in:

Bibliographic Details
Published in	Nature communications Vol. 15; no. 1; pp. 6189 - 10
Main Authors	Cheng, Junwei, Huang, Chaoran, Zhang, Jialong, Wu, Bo, Zhang, Wenkai, Liu, Xinyu, Zhang, Jiahui, Tang, Yiyi, Zhou, Hailong, Zhang, Qiming, Gu, Min, Dong, Jianji, Zhang, Xinliang
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 23.07.2024 Nature Publishing Group Nature Portfolio
Subjects	639/624/399/1099 639/624/400/1113 Algorithms Artificial intelligence Audio data Constraints Deep learning Diffractive optics Domains Energy efficiency Humanities and Social Sciences Latency Machine learning multidisciplinary Neural networks Optical data processing Optics Parameters Photonics Processors Science Science (multidisciplinary) Training
Online Access	Get full text
ISSN	2041-1723 2041-1723
DOI	10.1038/s41467-024-50677-3

Cover

More Information
Summary:	Multimodal deep learning plays a pivotal role in supporting the processing and learning of diverse data types within the realm of artificial intelligence generated content (AIGC). However, most photonic neuromorphic processors for deep learning can only handle a single data modality (either vision or audio) due to the lack of abundant parameter training in optical domain. Here, we propose and demonstrate a trainable diffractive optical neural network (TDONN) chip based on on-chip diffractive optics with massive tunable elements to address these constraints. The TDONN chip includes one input layer, five hidden layers, and one output layer, and only one forward propagation is required to obtain the inference results without frequent optical-electrical conversion. The customized stochastic gradient descent algorithm and the drop-out mechanism are developed for photonic neurons to realize in situ training and fast convergence in the optical domain. The TDONN chip achieves a potential throughput of 217.6 tera-operations per second (TOPS) with high computing density (447.7 TOPS/mm 2 ), high system-level energy efficiency (7.28 TOPS/W), and low optical latency (30.2 ps). The TDONN chip has successfully implemented four-class classification in different modalities (vision, audio, and touch) and achieve 85.7% accuracy on multimodal test sets. Our work opens up a new avenue for multimodal deep learning with integrated photonic processors, providing a potential solution for low-power AI large models using photonic technology. Most photonic processors can only handle a single data modality due to the lack of abundant parameter training in optical domain. Here, authors propose and demonstrate a trainable diffractive optical neural network chip based on on-chip diffractive optics with tunable elements to address these constraints.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2041-1723 2041-1723
DOI:	10.1038/s41467-024-50677-3