Learning from algorithm-generated pseudo-annotations for detecting ants in videos

Deep learning (DL) based detection models are powerful tools for large-scale analysis of dynamic biological behaviors in video data. Supervised training of a DL detection model often requires a large amount of manually-labeled training data which are time-consuming and labor-intensive to acquire. In...

Full description

Saved in:

Bibliographic Details
Published in	Scientific reports Vol. 13; no. 1; pp. 11566 - 10
Main Authors	Zhang, Yizhe, Imirzian, Natalie, Kurze, Christoph, Zheng, Hao, Hughes, David P., Chen, Danny Z.
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 18.07.2023 Nature Publishing Group Nature Portfolio
Subjects	631/114/1305 631/114/1314 631/114/1564 631/114/2397 Algorithms Annotations Deep learning Humanities and Social Sciences multidisciplinary Neural networks Rainforests Science Science (multidisciplinary)
Online Access	Get full text
ISSN	2045-2322 2045-2322
DOI	10.1038/s41598-023-28734-6

Cover

More Information
Summary:	Deep learning (DL) based detection models are powerful tools for large-scale analysis of dynamic biological behaviors in video data. Supervised training of a DL detection model often requires a large amount of manually-labeled training data which are time-consuming and labor-intensive to acquire. In this paper, we propose LFAGPA (Learn From Algorithm-Generated Pseudo-Annotations) that utilizes (noisy) annotations which are automatically generated by algorithms to train DL models for ant detection in videos. Our method consists of two main steps: (1) generate foreground objects using a (set of) state-of-the-art foreground extraction algorithm(s); (2) treat the results from step (1) as pseudo-annotations and use them to train deep neural networks for ant detection. We tackle several challenges on how to make use of automatically generated noisy annotations, how to learn from multiple annotation resources, and how to combine algorithm-generated annotations with human-labeled annotations (when available) for this learning framework. In experiments, we evaluate our method using 82 videos (totally 20,348 image frames) captured under natural conditions in a tropical rain-forest for dynamic ant behavior study. Without any manual annotation cost but only algorithm-generated annotations, our method can achieve a decent detection performance (77% in F 1 score). Moreover, when using only 10% manual annotations, our method can train a DL model to perform as well as using the full human annotations (81% in F 1 score).
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2045-2322 2045-2322
DOI:	10.1038/s41598-023-28734-6