Implementation of Gradient Descent in TensorFlow using tf.gradients

One of the best things I like about TensorFlow that it can automatically compute gradient of a function. All we need to do is to setup the equation then run tf.gradients function to compute the gradients. We don’t need to go through a lot of pages to calculate the gradients of a loss function then convert it into code. We can simply take the advantage of TensorFlow to compute the gradient for us.

Let’s look at some examples to clarify a bit more.

Softmax Regression on MNIST dataset using TensorFlow’s built in Optimizer

Before diving into implementing the cost functions using tf.gradients , let’s try to refresh our memory by solving MNIST problem using TensorFlow’s built in optimizer such as GradientDescentOptimizer.

So what we did here is, we used the built in optimizer and let it compute the costs. What if we want to compute the gradient and update the weights by ourselves. That’s where tf.gradients function comes into play.

Softmax Regression on MNIST dataset using `tf.gradients`

By gradient descent formula the weights are needed to be updated like this,

To implement Gradient Descent I’m going to omit the optimizer code instead I’ll write my own weight updates.

Since there are weight matrix W and bias matrix b . We need to calculate the gradient for each weight then update it. So the implementation would look something like this.

# Computing the gradient of cost with respect to W and b
grad_W, grad_b = tf.gradients(xs=[W, b], ys=cost)
# Gradient Step
new_W = W.assign(W - learning_rate * grad_W)
new_b = b.assign(b - learning_rate * grad_b)

These three lines of codes are written instead of one, why did I go through a lot of trouble? Because if you need to implement gradient of a custom cost function and don’t want to write rigorous mathematics then you should definitely let TensorFlow to handle all the gradient computations.

We have built the computational graph, all I need to do is to run it in a session. Let’s try that out.

# Fit training using batch data
_, _, c =[new_W, new_b ,cost], feed_dict={x: batch_xs, y: batch_ys})

We don’t need the output from new_W and new_b that’s why I omitted those.

The full code and output is shown below,

Softmax Regression Using Gradient Formula

This is the gradient of Softmax function with respect to weight W .

As before, without using tf.gradients or using TensorFlow’s built in optimizer, this way the gradient equation can be implemented. Full code is given below.

How TensorFlow computes Gradient?

Image Courtesy: Wikipedia

You might be thinking, how does TensorFlow compute gradient of a given function?

The method TensorFlow uses is called Automatic Differentiation. You can know a lot from wikipedia.

I hope this article will help a lot for them who want to update the weights of their model without using TensorFlow’s built in optimizers. ;)