This project aims to build an image detector that localize and classify happy dogs in real-time! Step by step model building processes are explored in this project. The best model is used to see if Lucky is happy or sad.
Prebuilt YOLO model (sources: darkflow and YOLO) is being customized to localize happy dogs in the given image. The best model from hdc_v2 was successfully integrated to the customized YOLO detection model to localize/classify HappyDogs from the given images/videos.
Video source: here
- Problem with the HDD_v1: The YOLO detector feeds a dog bounding box to the HDC_v2 whether or not a dog is facing forward.
- A new detector model will be trained based on a custom dataset to localize classifiable dogs from non-classifiable ones to improve the accuracy of the integrated HDC_v2.
- Currently, under the labelling process; two classes: 'classifiable dog (dd)' and 'non-classifiable dogs' (d)
- If the accuracy of the custom YOLO detection is satisfactory, the model will be integrated with HDC_v2 into HDD_v2.
- The integrated model will be implemented to a real-time module.
Current stage of the custom YOLO detector (with 450 images and 30 epochs training)
The model from HDC_v1 suffered from overfitting. The model has been improved by making the structure less complex and by adding more data augmentation.
Hyperparameters (especially lr and batch size) were searched for small epochs to optimize the model.
The model seems to handle overfitting issue much better compared to the previous version. It exhibits a much-improved accuracy of ~ 90% on the test set.
Some wrongly labelled test set images were visualized to track down the ~ 10% error.
The model predictions are well aligned with my intuition.
Various resnet-like custom models and Resnet50 transfer-learned model have been tested. In general, models could achieve ~ 80% accuracy on the test set when tuned properly. Its relatively low accuracy may be due to the small number of samples (~700 images for each class). The dataset was obtained from online with minimal pre-processing steps, thus contain very high variance in the image features.
The model started to overfit after ~ 30 epochs of training. The best model was chosen from there by early stopping the learning.
Somewhat agreeable results.