Backprop Examples

We will look at three different ways of computing gradients, all of which rely on the same basic principle (backprop).

  1. The first method is to code a function and its derivatives manually using the rules of backprop.
  2. The second method is to use automatic differentiation using a tool called autograd.
  3. The third method is to use Tensorflow, a powerful machine learning toolkit from Google, which also includes methods to automatically differentiate an expression defined through Tensforflow primitives.

More about autograd

autograd is a Python package for algorithmic differentiation. It allows you to automatically compute the derivative of functions written in (nearly) native code. This makes it really easy to compute derivatives. Under the hood, it is also using reverse mode autodiff (backprop).

Mote about Tensorflow

Tensorflow is a powerful machine learning package from Google.

You can define functions as "computation graphs" using tensor flow operations, and it can automatically perform backprop (aka reverse mode automatic differentiation) to compute the gradient for you.

Note: You will need to install TensorFlow if you want to run these examples. It is not hard to install, but takes a bit more work to be able to call from a Jupyter notebook.

Backprop Example 1

We will see three different ways to compute the gradient of

$$f(x,y,z) = (2x + y)z$$

Manual backprop

In [6]:
# Backprop example

# Compute f(x,y,z) = (2*x+y)*z
x = 1.
y = 2.
z = 3.

# Forward pass
q = 2.*x + y   # Node 1
f = q*z        # Node 2

# Backward pass
df_dq = z          # Node 2 input
df_dz = q          # Node 2 input
df_dx = 2 * df_dq  # Node 1 input
df_dy = 1 * df_dq  # Node 1 input

grad = np.array([df_dx, df_dy, df_dz])

print f
print grad
12.0
[ 6.  3.  4.]

Autograd

In [7]:
import autograd.numpy as np  # Thinly wrapped version of numpy
from autograd import grad

def f(args):
    x,y,z = args
    return (2*x + y)*z

f_grad = grad(f)  # magic: returns a function that computes the gradient of f

x = 1.
y = 2.
z = 3.

print f([x, y, z])
print f_grad([x, y, z])
12.0
[6.0, 3.0, 4.0]

Tensorflow

In [8]:
import tensorflow as tf

# Define tensor flow variables x, y, z
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
z = tf.placeholder(tf.float32)

# Define the function
f = (2*x + y)*z    # NB: operator overloading. 
                   
# The output f is a tensforflow object representing the computation 
# graph to compute f

# Define gradient object
grad = tf.gradients(f, [x, y, z])

# Use session to compute the values for f and grad
session = tf.Session()
values = session.run([f, grad],                 # query these output values
                     {x: [1], y: [2], z: [3]})  # specify input values for x, y, z
session.close()

# Print result as numpy arrays
print np.array(values[0]) 
print np.array(values[1])
[ 12.]
[[ 6.]
 [ 3.]
 [ 4.]]

Backprop Example 2

Here is a slighly more complex example:

$$f(x) = 10*\exp(\sin(x)) + \cos^2(x)$$

Manual backprop

In [12]:
import numpy as np

# Backprop example
# f(x) = 10*np.exp(np.sin(x)) + np.cos(x)**2

# Forward pass
x = 2
a = np.sin(x)   # Node 1
b = np.cos(x)   # Node 1
c = b**2        # Node 3
d = np.exp(a)   # Node 4
f = 10*d + c    # Node 5 (final output)

# Backward pass
df_dd = 10                    # Node 5 input
df_dc = 1                     # Node 5 input
df_da = np.exp(a) * df_dd     # Node 4 input
df_db = 2*b * df_dc           # Node 3 input
df_dx =  np.cos(x) * df_da - np.sin(x) * df_db  # Node 2 and 1 input

print f, df_dx
24.9989554697 -9.57436618465

Autograd

In [13]:
import autograd.numpy as np  # Thinly wrapped version of numpy
from autograd import grad

def f(x):
    return 10*np.exp(np.sin(x)) + np.cos(x)**2

f_grad = grad(f)

x = 2.

print f(x)
print f_grad(x)
24.9989554697
-9.57436618465

Tensorflow

In [14]:
import tensorflow as tf
import numpy as np

x = tf.placeholder(tf.float32)

# Here is the expression for our function
f = 10*tf.exp(tf.sin(x)) + tf.cos(x)**2

df_dx = tf.gradients(f, x)

session = tf.Session()
values = session.run([f, df_dx], 
                     {x: [2]})
session.close()

print np.array(values[0])
print np.array(values[1])
[ 24.99895287]
[[-9.57436562]]
In [ ]: