Index Terms—Video, Keyframe Extraction, Body Motion Detection, Gaming Theory Frame extraction, DCNN, CNN, InceptionV3, LSTM, GRU, Gesture Recognition, Social Media, Platform and Online Safety.
Current social media platforms do not offer a sign language detection model that fits the online deaf-mute community, in this paper, we deliver a groundbreaking Sign language recognition model for call for help ASL Signs. Despite the fast-growing user volumes on social media platforms, the existing safety detection is based on written or spoken words and not focused on body or hand gestures. In our study, we focus on call-for-help signs detection which also can be used to detect call-for-help-related body gestures from video and ensure users are safe. We have focused on 9 words from the American Sign Language. Our dataset comprised 431 videos out of which 6985 were extracted. We have used various machine-learning models for classification and found that the model where the Feature Extraction was done using InceptionV3 architecture and classification using Dilated CNN gave the best performance with 95% accuracy on validation data and 98% accuracy on test data. When using an ensemble of predictions from LSTM and DCNN models, we were able to achieve an accuracy of 98.7% on sample data.
The primary technique comes from the gaming theory method, the established model first calculates and sum up the screen pixels that fall in the threshold values of HSV and YCbCr which reflects on exposed skin color. This method can detect and extract five images that contain skin color with the highest pixel counts.