LFSamba: Marry SAM With Mamba for Light Field Salient Object Detection

A light field camera can reconstruct 3D scenes using captured multi-focus images that contain rich spatial geometric information, enhancing applications in stereoscopic photography, virtual reality, and robotic vision. In this work, a state-of-the-art salient object detection model for multi-focus l...

Full description

Saved in:

Bibliographic Details
Published in	IEEE signal processing letters Vol. 31; pp. 3144 - 3148
Main Authors	Liu, Zhengyi, Wang, Longzhen, Fang, Xianyong, Tu, Zhengzheng, Wang, Linbo
Format	Journal Article
Language	English
Published	New York IEEE 2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adaptation models Annotations Convolution Costs Datasets Feature extraction Field cameras Image enhancement Image reconstruction light field Machine vision Mamba Modelling multi-focus Object detection Object recognition Salience salient object detection SAM Solid modeling Stereophotography Supervised learning Three-dimensional displays Transformers Virtual reality
Online Access	Get full text
ISSN	1070-9908 1558-2361
DOI	10.1109/LSP.2024.3493799

Cover

More Information
Summary:	A light field camera can reconstruct 3D scenes using captured multi-focus images that contain rich spatial geometric information, enhancing applications in stereoscopic photography, virtual reality, and robotic vision. In this work, a state-of-the-art salient object detection model for multi-focus light field images, called LFSamba, is introduced to emphasize four main insights: (a) Efficient feature extraction, where SAM is used to extract modality-aware discriminative features; (b) Inter-slice relation modeling, leveraging Mamba to capture long-range dependencies across multiple focal slices, thus extracting implicit depth cues; (c) Inter-modal relation modeling, utilizing Mamba to integrate all-focus and multi-focus images, enabling mutual enhancement; (d) Weakly supervised learning capability, developing a scribble annotation dataset from an existing pixel-level mask dataset, establishing the first scribble-supervised baseline for light field salient object detection.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1070-9908 1558-2361
DOI:	10.1109/LSP.2024.3493799