GSL Alphabet Recognizer using MediaPipe. We estimate hand pose using MediaPipe (Python Version). This is a sample program that recognizes GSL Alphabet hand signs with a simple multilayer perceptron (MLP) using the detected key points.
This repository contains the following contents.
- Sample code program
- GSL Alfabet Hand sign recognition model (TFLite)
- Learning data for Alphabet sign recognition
mediapipe v0.8.11 OpenCV 3.4.2 or Later Tensorflow 2.3.0 or Later scikit-learn 0.23.2 or Later (Only if you want to display the confusion matrix) matplotlib 3.3.2 or Later (Only if you want to display the confusion matrix)
Here's how to run the demo using your webcam:
python gsl-alfabet-recognizer.py
or in some cases as macOS
python3 gsl-alfabet-recognizer.py
The following options can be specified when running the demo.
--device Specifying the camera device number (Default:0) --width Width at the time of camera capture (Default:960) --height Height at the time of camera capture (Default:540) --use_static_image_mode Whether to use static_image_mode option for MediaPipe inference (Default:Unspecified) --min_detection_confidence Detection confidence threshold (Default:0.5) --min_tracking_confidence Tracking confidence threshold (Default:0.5)
│ gsl-alfabet-recognizer.py
│
├─model
│ ├─keypoint_classifier
│ │ keypoint.csv
│ │ keypoint_classifier.hdf5
│ │ keypoint_classifier.py
│ │ keypoint_classifier.tflite
│ └─ keypoint_classifier_label.csv
└─utils
└─cvfpscalc.py
This is a sample program for inference. In addition, learning data (key points) for hand sign recognition.
This directory stores files related to hand sign recognition. The following files are stored.
- Training data (keypoint.csv)
- Trained model (keypoint_classifier.tflite)
- Label data (24 classes) (keypoint_classifier_label.csv)
- Inference module (keypoint_classifier.py)
- Greek alphabet (alphabet-24 letters.csv)
This is a module for FPS measurement.
The ahnd key points added to "model/keypoint_classifier/keypoint.csv" as shown below.
1st column: Pressed number (used as class ID), 2nd and subsequent columns: Key point coordinates
The key point coordinates are the ones that have undergone the following preprocessing one of the 24 classes.
the landmarks coordinates does :
- Convert to relative coordinates
- Flatten to one-dimensional array
- Normalize to the maximum value (absolute value)
Class ID: 0 is related to "alpha" label. Class ID: 1 to "beta" label.
. . .
Class ID: 23 for "omega" label.
GSL Alphabet Recognizer using mediapipe is under Apache v2 license.