Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Yoga is a centuries-old style of exercise followed by sports personnel, patients, and physiotherapist as their regime. A correct posture and technique are the key points in yoga to reap the maximum benefits. Hence, developing a model to classify yoga postures correctly is a recently emerging researc...

Full description

Saved in:

Bibliographic Details
Published in	Journal of ambient intelligence and humanized computing Vol. 14; no. 12; pp. 16551 - 16562
Main Authors	Garg, Shubham, Saxena, Aman, Gupta, Richa
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.12.2023 Springer Nature B.V
Subjects	Accuracy Artificial Intelligence Classification Computational Intelligence Datasets Deep learning Engineering Exercise Image contrast Literature reviews Machine learning Network latency Neural networks Original Research Physical fitness Robotics and Automation Training User Interfaces and Human Computer Interaction Yoga Deep learning Computer vision Skeletonization Convolutional neural networks MediaPipe Classification
Online Access	Get full text
ISSN	1868-5137 1868-5145
DOI	10.1007/s12652-022-03910-0

Cover

More Information
Summary:	Yoga is a centuries-old style of exercise followed by sports personnel, patients, and physiotherapist as their regime. A correct posture and technique are the key points in yoga to reap the maximum benefits. Hence, developing a model to classify yoga postures correctly is a recently emerging research topic. The paper presents a novel architecture that aims to classify various yoga poses. The proposed model estimates and classifies yoga poses into five broad categories with low latency. In the proposed architecture, the images are skeletonized before inputting into the model. The skeletonization process is done using the MediaPipe library for body keypoint detection. The paper compares the performance of various deep learning models with and without skeletonization. Different learning models showed the optimum result with the training of skeletonized images to the network. The comparison is drawn to establish the positive impact of skeletonization on the results obtained by various models. VGG16 achieves the highest validation accuracy on non-skeletonized images (95.6%), followed by InceptionV3, NASNetMobile, YogaConvo2d (proposed model) (89.9%), and lastly, InceptionResNetV2. In contrast, the proposed model YogaConvo2d using skeletonized images reports a validation accuracy of 99.62%, followed by VGG16, InceptionResNetV2, NASNetMobile, and InceptionV3.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1868-5137 1868-5145
DOI:	10.1007/s12652-022-03910-0