Fine-tuned deep convolutional neural network for hand segmentation in egocentric videos

Eyitomilayo Yemisi-Babatope; Miguel Ángel Camargo-Rojas; Mireya Saraí García-Vázquez; Alejandro Álvaro Ramírez-Acosta

doi:10.1117/12.2677242

4 October 2023 Fine-tuned deep convolutional neural network for hand segmentation in egocentric videos

Eyitomilayo Yemisi-Babatope, Miguel Ángel Camargo-Rojas, Mireya Saraí García-Vázquez, Alejandro Álvaro Ramírez-Acosta

Proceedings Volume 12673, Optics and Photonics for Information Processing XVII; 1267303 (2023) https://doi.org/10.1117/12.2677242
Event: SPIE Optical Engineering + Applications, 2023, San Diego, California, United States

Abstract

Semantic segmentation is a high-level task in computer vision that associates each pixel of an image with a semantic(class) label. Fine-semantic segmentation is a pixel-level task that provides detailed information necessary to easily identify the region of the object of interest. Hands are one of the main channels for communication, enhancing human-object and human-environment interaction, and in egocentric videos, they appear to be ubiquitous and at the center of vision and activities, hence our interest in hand segmentation. Fine-semantic segmentation of hands locates, identifies, and groups together pixels associated with the hands, with a hand semantic label. We performed fine semantic segmentation of hands, by improving the architecture of the state-of-the-art deep convolutional neural network (RefineNet). We achieve a finer and more accurate result by amending the process of obtaining and combining high and low-level features, and the pixel grouping for pixel-level classification. We performed this task on a public egocentric video dataset (EgoHands). We evaluate our model (RefineNet-Pix) performance by adopting the existing pixel-level metric, mean precision (mPrecision). Comparing our result with the baseline reported in Urooj’s work, we obtain accuracy higher than 87.9% of the benchmark. Our finer and more accurate semantic segmentation result guarantees good performance under various lighting conditions and complex backgrounds, making it suitable for use in both indoor and outdoor environments. Fine-hand semantic segmentation can be applied in image analysis, medical systems (with a focus on understanding hand motion for prediction, diagnosis, and monitoring), hand gesture recognition (human-computer interaction and understanding action), and robotics(grasp and manipulation of objects).

Conference Presentation

Citation Download Citation

Eyitomilayo Yemisi-Babatope, Miguel Ángel Camargo-Rojas, Mireya Saraí García-Vázquez, and Alejandro Álvaro Ramírez-Acosta "Fine-tuned deep convolutional neural network for hand segmentation in egocentric videos", Proc. SPIE 12673, Optics and Photonics for Information Processing XVII, 1267303 (4 October 2023); https://doi.org/10.1117/12.2677242

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available