Transfer learning between ImageNet and CIFAR10

5 min readFeb 18, 2021

Abstract

Machine learning is the fastest growing research subject, it is exactly what catches my attention as it does all over the world and in most fields: Industry, Medicine, Commerce, Real estate… It is the path and the title of every successful business out there. But the lack of data remains a major challenge for machine learning applications. But as humanity does to survive and ensure continuity by transferring its knowledge from generation to generation by providing books, notes, arts… The machine learning community takes the lesson as it is and most of the models cutting edge and more complex are open source available, and the most exciting of these models is already trained and ready to use. This places me today in the position of sharing the process of transfer of learning between two models.

Keywords: Machine learning, Transfer learning, ImageNet, CIFAR10

Introduction

Everyone is interested in applying machine learning and taking advantage of their benefits. Collecting the right and good data remains the business of large companies or researchers. Or maybe takes many other years to get the right amount of information to easily train a machine learning model. But what stands out is how to transfer knowledge between two models or mix two models.

Transfer learning is not a new subject, but is becoming difficult and complicated as machine learning models become more advanced, and it will be clear that transfer machine learning becomes necessary if insufficient amount of data preventing achieves a good accuracy. In the following words we start to take advantage of a model trained on the famous ImageNet dataset which contains millions images in order take benefit of gathered features to reach a good accuracy distinguishing between the 10 categories of CIFAR10 dataset.

Materials and Methods

First we need to import and explore our CIFAR10 dataset. CIFAR10 is a collection of labeled tiny images of shape (32, 32, 3) gathered by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The dataset contains a 50000 images as a training batch belongs to 10 categories. and 10000 images in the valid batch.

The dataset is belongs to 10 categories:

[‘Airplane’, ‘Automobile’, ‘Bird’, ‘Cat’, ‘Deer’, ‘Dog’, ‘Frog’, ‘Horse’, ‘Ship’, ‘Truck’]

Now after we carefully discover our dataset we start by loading our pre-trained model. It is clearly mentioned the generosity of machine learning community we can find a knowing cut-edge model trained and ready for use in our case we start import one of them but you can find more in Keras applications.

Importing the pre-trained model is the first step in our digging behind the right combination by taking knowledge from the source domain — VGG16 pre-trained model here- and the target task to classify the CIFAR10 dataset based on the 10 categories.

The source model accept shapes for (224, 224, 3) and CIFAR10 consist of tiny images of size (32, 32, 3) and need to mention each model had the minimum required size as VGG16 can accept until the 32 pixel, but working on a tiny size limit the number of extracted features.

So a lambda layer needs to take a (32, 32, 3) shape and transferred to the adequate shape. and a final Flatten layer helps to serve a fully connected layer later.

In the last code output we can see that we already get 14,714,688 Total parameters and 0 trainable parameter this the result of freezing model and preventing it of taking further knowledge since we aim to gather existing features and knowledge from it.

Last step before we go on with the feature extractor is by preprocessing our data the same way as the source model and source dataset preprocessed. in the case of CIFAR10 since all the images had the same size we don’t need to standardize size, But feature scaling can be good in order to normalize our inputs.model.add(Dense(512, activation=’relu’, input_dim=input_shape)) model.add(Dropout(0.3)) model.add(Dense(512, activation=’relu’)) model.add(Dropout(0.3)

Now time to store our features in variable for training later

Now time to build the architecture of classifier and take the extracted features as input to it.

The result is very good regarding to the gain of time we had but still far away from ready for production code and our model still suffer from slight over-fitting.

Let’s now build our deep learning model and train it. We won’t extract the features like last time hence, we will be passing the source_model object as an input to our own model. We bring the learning rate slightly down since we will be training to keep the weights out of stacking in local . Do remember that the source_model is still frozen here in order to push our model further we can make a little portion of the model trainable the trainable params is bout the half of the overall parameters.

=================================================================
Total params: 27,828,042
Trainable params: 13,113,354
Non-trainable params: 14,714,688
_________________________________________________________________

Results

The previous model is doing well achieving the a 97% accuracy in training dataset and 75% accuracy in validation dataset.

Discussion

From the previous result it appear like the model suffer from overfitting which an understandable behavior since the pretrained model comes with very detailed and specefic featuree.

In order avoiding this behavior Data augmentation is the good solution and productive one helps with this situation.

Literature Cited

A Comprehensive Hands-on Guide to Transfer Learning with Real-World Applications in Deep Learning — [Dipanjan (DJ) Sarkar]

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems — Aurélien Géron

Part 1: Image Classification using Features Extracted by Transfer Learning in Keras — [Ahmed Gad]