Multi-task Learning in Keras | Implementation of Multi-task Classification Loss

Using neural networks and with the help of high level tensor libraries we can build models that can handle classification, regression and other tasks easily. When I did coursera specialization on deep learning, watching a video on multi-task learning by Andrew Ng I quickly set up my mind to try this out. But the resources available on the internet were sort of a bit alien to me. I wanted to make this problem as simple as it was shown on the video.

This article might reflect the resource offered by the course.

Finally I could manage some time to try this idea and guess what it worked perfectly, otherwise I wouldn’t have been writing this article :P.

What is Multi-task Learning?

Let’s learn some basics before we move on. Multi-task learning means using one model, here using one neural network we will do simultaneously several tasks.


Suppose you want to build a self-driving car and a part of the problem is you have to classify objects on the street.

Input will be an image and output will be a 1D vector.

Let’s consider four classes for now which are, Y={Pedestrian, Car, Stop Sign, Traffic Light}.

We are not going to localize object we are just going to classify them. Our model will only say if there exists objects from the set Y or not.

The image only contains stop sign and a car.

So the output vector of this image would be [0, 1, 1, 0], since we are considering the mentioned classes for the time being.

Where X is the input image and y hat is the prediction

This image above will be the network which will solve these 4 tasks alone.

What About the Loss Function?

There is a slight problem, you can’t neither train the network with a binary crossentropy loss nor with a categorical cross entropy loss. We need to be a bit clever here.

This is nothing but the log loss applied on each class separately. What we are doing here is aggregating all of the separate loss into one and then just computing average of it. So k in this loss function represents number of classes we are going to classify from, and rest bears the conventional meaning, such as m means number of training examples and y hat means predicted output.

When to use Multi-task Learning?

  • Training on a set of tasks that could benefit from having shared lower-level features
  • Amount of data you have for each task should be quite similar
  • A big enough neural network can do well on all tasks

It’s a better idea to use a single network to do all tasks instead of using separate models for separate tasks.

We know what multi-task learning is, we know what’s the loss function for the problem to solve. Let’s pour down everything in code to see if it actually works

Downloading and Viewing the dataset

Download the dataset from this URL. If you want to work on my repository then clone this repo then put the h5 file on data folder.

Input data matrix

# (1600, 100, 100, 3)

Output or Target Vector

If you visit the URL for the data, you can see there are 2k images with natural scene. And each image could be one of these classes.

desert, mountains, sea, sunset and trees. So the dimension of output vector should be like this.

Implementation of Multi-task Loss Function

The implementation of multi-task loss function is like this.

Importing numpy as np then replacing all K with np will yield the implementation in NumPy except the clipping part.

Activation of Last Layer

If you are wondering what kind of activation you should use on the last layer. Let me spoil it for you, it’s sigmoid.

Model Building and Training

There are only 2k images and I am going to use 1.6k as training set and rest will be test set. So my target would be to build an overfitted model instead of a well generalized one. The task is left as an exercise to the reader. :P

Without thinking much, a CNN model is built then I fed the training and validation examples with the custom loss function.

Code Output

Epoch 1/50
1600/1600 [==============================] - 6s 4ms/step - loss: 2.7115 - acc: 0.3275 - val_loss: 2.3993 - val_acc: 0.4275
Epoch 2/50
1600/1600 [==============================] - 2s 1ms/step - loss: 2.0483 - acc: 0.5375 - val_loss: 1.9218 - val_acc: 0.5050
Epoch 3/50
1600/1600 [==============================] - 2s 1ms/step - loss: 1.8621 - acc: 0.5888 - val_loss: 1.8554 - val_acc: 0.5925
1600/1600 [==============================] - 2s 1ms/step - loss: 0.0560 - acc: 0.8706 - val_loss: 4.4425 - val_acc: 0.6500
Epoch 47/50
1600/1600 [==============================] - 2s 1ms/step - loss: 0.0719 - acc: 0.8700 - val_loss: 4.2263 - val_acc: 0.6950
Epoch 48/50
1600/1600 [==============================] - 2s 1ms/step - loss: 0.0571 - acc: 0.8744 - val_loss: 4.4416 - val_acc: 0.6750
Epoch 49/50
1600/1600 [==============================] - 2s 1ms/step - loss: 0.0615 - acc: 0.8687 - val_loss: 4.1540 - val_acc: 0.6550
Epoch 50/50
1600/1600 [==============================] - 2s 1ms/step - loss: 0.0394 - acc: 0.8638 - val_loss: 4.3329 - val_acc: 0.6800

The validation accuracy is 68% and training accuracy is almost 86% and you can see the loss is going down from first to last iteration.


Now the big problem is how do you infer using this model? Let’s solve the problem.

I will apply a element wise threshold here, if the output of a single node is greater than 0.5, then I’ll assign 1 otherwise 0.

This function does the all.

Here is the demo output.

infer(x_train, model=model)
[['sea', 'sunset'],
['sea', 'sunset'],
['mountain', 'trees'],

That’s it for today. I showed how to implement a custom loss function in keras and train it using a CNN. I hope you enjoyed the blog post.