The progress in technology that has happened over the last 10 years is unbelievable. Every corner of the world is using the top most technologies to improve existing products while also conducting immense research into inventing products that make the world the best place to live.
Some of these are the Amazon just walk out technology, the Tesla autopilot car, Spaceships and more. All of these breakthrough products could never exist without machine learning and deep learning algorithms.
Despite the complexity of the computations involved, some very sophisticated calculations can easily be handled by frameworks created for machine learning and deep learning. This article will help you get started with one of the most popular frameworks – Tensorflow.
So in this article, you will get a taste of deep learning with some interesting application, the handwritten digits recognization application.
Handwritten digits recognition using google tensorflow with python
Before we begin. We would like to thank Google for access to their open source the tensorflow library.
Table of contents:
- What is Tensorflow?
- About the MNIST dataset
- Implementing the Handwritten digits recognition model
What is Tensorflow?
Tensorflow is an open source library created by the Google Brain Trust for heavy computational work, geared towards machine learning and deep learning tasks. It is built on C, C++ making its computations very fast while it is available for use via a Python, C++, Haskell, Java and Go API.
It created data graph flows for each model, where a graph consists of two units – a tensor and a node.
- Tensor: A tensor is any multidimensional array.
- Node: A node is a mathematical computation that is being worked at the moment.
A data graph flow essentially maps the flow of information via the interchange between these two components. Once this graph is complete, the model is executed and the output is computed.
You can learn a lot more from the tensorflow official document
Now let’s begin start building handwritten digits recognition application. To start we need the dataset of handwritten digits for training and for testing the model. MNIST is the most popular dataset having handwritten digits as image files.
About the MNIST dataset
To begin our journey with Tensorflow, we will be using the MNIST database to create an image identifying model based on simple feedforward neural network with no hidden layers.
MNIST is a computer vision database consisting of handwritten digits, with labels identifying the digits. As mentioned earlier, every MNIST data point has two parts: an image of a handwritten digit and a corresponding label.
We’ll call the images “x” and the labels “y”. Both the training set and test set contain images and their corresponding labels; for example, the training images are mnist.train.images and the training labels are mnist.train.labels.
Each image is 28 pixels by 28 pixels. We can interpret this as a big array of numbers. We can flatten this array into a vector of 28×28 = 784 numbers.
It doesn’t matter how we flatten the array, as long as we’re consistent between images. From this perspective, the MNIST images are just a bunch of points in a 784-dimentional vector space.
Implementing the Handwritten digits recognition model
Implementing the handwritten digits model using Tensorflow with Python
We will be building simple feedforward neural network using softmax to predict the number in each image. We begin by calling in a Python environment.
As you need python as a prerequisite for understanding the below implementation. If you are new to the python and facing any environment issues then get quick hands on experience on python and the environment setup before you start.
Download the MNIS
1
2
|
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(“model_data/“, one_hot=True)
|
Import Tensorflow to your environment
1
|
import tensorflow as tf
|
Initializing parameters for the model
1
2
3
|
batch =100
learning_rate=0.01
training_epochs=10
|
In machine learning, an epoch is a full iteration over samples. Here, we are restricting the model to 10 complete epochs or cycles of the algorithm running through the dataset.
The batch variable determines the amount of data being fed to the algorithm at any given time, in this case, 100 images.
The learning rate controls the size of the parameters and rates, thereby affecting the rate at which the model “learns”.
Creating Placeholders
1
2
|
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
|
The method tf.placeholder allows us to create variables that act as nodes holding the data. Here, x is a 2-dimensionall array holding the MNIST images, with none implying the batch size (which can be of any size) and 784 being a single 28×28 image. y_ is the target output class that consists of a 2-dimensional array of 10 classes (denoting the numbers 0-9) that identify what digit is stored in each image.
Creating Variables
1
2
|
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
|
Here, W is the weight and b is the bias of the model. They are initialized with tf.Variable as they are components of the computational graph that need to change values with the input of each different neuron.
Initializing the model
1
|
y = tf.nn.softmax(tf.matmul(x,W) + b)
|
We will be using a simple softmax model to implement our network. Softmax is a generalization of logistic regression, usually used in the final layer of a network. It is useful because it helps in multi-classification models where a given output can be a list of many different things.
It provides values between 0 to 1 that in addition give you the probability of the output belonging to a particular class.
Defining Cost Function
1
|
cross_entropy = tf.reduce_mean(–tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
|
This is the cost function of the model – a cost function is a difference between the predicted value and the actual value that we are trying to minimize to improve the accuracy of the model.
Determining the accuracy of parameters
1
2
|
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
|
Implementing Gradient Descent Algorithm
1
|
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
|
Tensorflow comes pre-loaded with a lot of algorithms, one of them being Gradient Descent. The gradient descent algorithm starts with an initial value and keeps updating the value till the cost function reaches the global minimum i.e. the highest level of accuracy.
This is obviously dependant upon the number of iterations being permitted for the model.
Initializing the session
1
2
|
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
|
Creating batches of data for epochs
1
2
3
4
|
for epoch in range(training_epochs):
batch_count = int(mnist.train.num_examples/batch)
for i in range(batch_count):
batch_x, batch_y = mnist.train.next_batch(batch)
|
Executing the model
1
|
sess.run([train_op], feed_dict={x: batch_x, y_: batch_y})
|
Print accuracy of the model
1
2
3
4
|
if epoch % 2 == 0:
print “Epoch: “, epoch
prit “Accuracy: “, accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels})
print “Model Execution Complete”
|
Share your accuracy details I would love to see those.