Deterministically perform backprop on MNIST

3 minute read

Published:

I took some time to figure out how to make backprop produce replicable results on MNIST dataset. It turns out that properly setting up random seeds is not enough. Rather, one needs to additionally control the random shuffle of training data in mini-batch-based gradient descent. Here below I first show some snippets of codes. I then provide the codes for the whole classification pipeline.

Set up random seeds

There are three random seeds that need to be fixed: seeds of python, numpy, and tensorflow pseudo-random generators.

seed_value= 0 

 # Set the seed for python pseudo-random generator
import random 
random.seed(seed_value) 

# Set the seed for numpy pseudo-random generator
import numpy as np
np.random.seed(seed_value) 

# Set the seed for tensorflow pseudo-random generator
import tensorflow as tf
tf.set_random_seed(seed_value) 

Control the training data in mini-batches

Setting up these random seeds is not enough to guarantee that the performances of trials of gradient descents are consistent with each other. The problem is due to the next_batch operation in the DataSet class, which is used for mini-batch based gradient descent. See this link for the source code of next_batch. More concretely, the inconsistent gradient descent result comes from the fact that every time next_batch is called, by default it (i) shuffles the whole training dataset and (ii) chooses the next couples of training samples after the one indexed by _index_in_epoch. To take these factors into account, we need to

# reset the index for epochs
mnist.train._epochs_completed = 0 

# reset the index counter within an epoch
mnist.train._index_in_epoch = 0 

# disallow shuffle
batch_xs, batch_ys = mnist.train.next_batch(BATCH_SIZE, shuffle = False)

Complete MNIST classification pipeline

With the two adjustments mentioned above made, we can get deterministic performances for MNIST classification. Here is a toy example of the classification pipeline.


import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
old_v = tf.logging.get_verbosity()
tf.logging.set_verbosity(tf.logging.ERROR)

seed_value= 0 
import random 
random.seed(seed_value) # Set the seed for python pseudo-random generator 
np.random.seed(seed_value) # Set the seed for numpy pseudo-random generator
tf.set_random_seed(seed_value) # Set the seed for tensorflow pseudo-random generator


mnist = input_data.read_data_sets("/Users/liutianlin/Desktop/Academics/comp-neuro/MNIST_data/", one_hot=True)

#Define a 2 layer feedfoward network with 100 hidden neurons 
N_NEURON_IN = 784 # number of input neurons
N_NEURON_OUT = 10 # number of output neurons
N_NEURON_HIDDEN_1 = 100 # number of neurons in the first hidden layer
STDDEV = 0.01 # standard deviation for weights initialization


W1 = tf.Variable(tf.truncated_normal([N_NEURON_HIDDEN_1, N_NEURON_IN], stddev=STDDEV, seed= seed_value)) # Weights matrix connecting the input layer to the first layer
b1 = tf.Variable(tf.truncated_normal([N_NEURON_HIDDEN_1,1], stddev=STDDEV, seed= seed_value)) # Bias vector connecting the input layer to the first layer

W2 = tf.Variable(tf.truncated_normal([N_NEURON_OUT, N_NEURON_HIDDEN_1], stddev=STDDEV, seed=seed_value)) # Weights matrix connecting the first hidden layer to the output
b2 = tf.Variable(tf.truncated_normal([N_NEURON_OUT, 1], stddev=STDDEV, seed=seed_value)) # Bias vector connecting the first hidden layer to the output layer

A0 = tf.placeholder(tf.float32, [N_NEURON_IN, None]) # input 
Y = tf.placeholder(tf.float32, [N_NEURON_OUT, None]) # target


#Forward Pass
Z1 = tf.add(tf.matmul(W1,A0), b1)
A1 = tf.nn.relu(Z1)

Z2 = tf.add(tf.matmul(W2, A1), b2)
A2 = tf.nn.softmax(Z2, axis = 0)


BATCH_SIZE = 50
REG1 = tf.constant(0.001)
REG2 = tf.constant(0.001)

ordinaryCost = tf.losses.softmax_cross_entropy(Y, A2)

regCost1 = tf.multiply(REG1, tf.nn.l2_loss(W1))
regCost2 = tf.multiply(REG2, tf.nn.l2_loss(W2))

cost = tf.add(tf.add(ordinaryCost, regCost1), regCost2)
step = tf.train.GradientDescentOptimizer(0.1).minimize(cost)

accMat = tf.equal(tf.argmax(A2, 0), tf.argmax(Y, 0))
accRes = tf.reduce_sum(tf.cast(accMat, tf.float32))


#Backward Pass

mnist.train._index_in_epoch = 0
mnist.train._epochs_completed = 0
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print("=========")
    for i in range(10000):
        batch_xs, batch_ys = mnist.train.next_batch(BATCH_SIZE, shuffle = False)
        sess.run(step, feed_dict = {A0: batch_xs.T, Y: batch_ys.T})
        if i % 1000 == 0:
            res = sess.run(accRes, feed_dict =
                           {A0:mnist.validation.images[:1000].T,
                            Y : mnist.validation.labels[:1000].T})
            print("Iteration:", i, " Validation accuracy:", res/1000)

    print("=========\n")
    res = sess.run(accRes, feed_dict =
                           {A0: mnist.test.images[:100000].T,
                            Y : mnist.test.labels[:100000].T})
    print("Test accuracy ", res/mnist.test.labels.shape[0])

When the backward pass part is run multiple times, the results would be consistent.