Transfer learning for imbalanced visual recognition
This project studies visual classification under a realistic long-tailed distribution. It compares a handcrafted-feature baseline with VGG16 transfer learning, then evaluates how augmentation, tuning, and fine-tuning affect class-level performance.
Challenge
- Real-world visual datasets often have uneven class distributions and varied backgrounds.
- A classical baseline is useful, but may struggle with complex visual variation.
- The evaluation needs to consider class balance, not only overall accuracy.
System architecture
Data and inputs
- Open Images data for three classes: Car, Dog, and Person.
- 2,402 training images and 598 validation images.
- Imbalanced, in-the-wild visual samples with diverse backgrounds and viewpoints.
Technical approach
- Build a HOG+SVM baseline to establish traditional visual-feature performance.
- Use VGG16 as a pretrained feature extractor with a custom dense classification head.
- Apply augmentation, dropout, and learning-rate tuning to improve generalization.
- Run a fine-tuning experiment by unfreezing deeper VGG16 layers.
Evaluation and results
2,402 training images / 598 validation images
HOG+SVM accuracy 67.00%
Tuned VGG16 accuracy 92.00%
- HOG+SVM reached 67.00% accuracy and struggled with visual variation.
- The tuned VGG16 workflow reached 92.00% accuracy with stronger class balance.
- Fine-tuned VGG16 reached 89.30%, strong but lower than the tuned frozen-transfer setup.
- Class-level precision, recall, and F1 stayed balanced across Car, Dog, and Person.
Implementation and code
Implementation focus
The implementation connects data preparation, modeling, evaluation, and interpretation in a structured workflow that makes the technical decisions clear.
Source code
The code is available for exploring the implementation details and extending the experiment when needed.
Scope and responsible use
The project is a focused modeling and evaluation study. Broader use should be supported by validation on additional data, robustness checks, monitoring, and domain-specific evaluation.
Future development
- Add more classes and stronger long-tail imbalance.
- Compare VGG16 with newer lightweight architectures.
- Expand interpretability with saliency maps and failure-case review.
Technical contribution
The project demonstrates how to compare traditional and deep-learning approaches under realistic visual-data imbalance while using class-level evaluation to avoid misleading conclusions.