Introduction
In this series, I will explore convolutional neural networks in comparison to standard neural networks. To begin with, the former is an evolution of the latter. Through analyzing this evolution, it is fascinating to see how particular design differences have such a great impact on performance and overall success. We will highlight these differences in performance and success to illustrate why convnets are often better than standard neural network models. For the context of this article, I will assume you have a basic understanding of the principles related to traditional feedforward and backpropagation processes but want to learn how and why convoluted models are more successful.
Convolutional nets are famously at the forefront of neural networks and machine learning developments in real-world applications. Due to technological advancements in computing platforms and algorithmic developments, convolutional nets now allow for neural network models to be implemented quickly, effectively, and efficiently on a large scale in business and professional situations. As a result, we are finally seeing image classification and voice recognition that at least matches and sometimes outperforms our own capabilities.
A quick look at the evolution of neural networks
Following an earlier failed attempt by an IBM researcher, the traditional neural net was first successfully implemented at Stanford in 1959, where two models were developed: “ADALINE” and “MADALINE.” At the time, most fieldwork was more theory-based and had yet to be put into practice, but Stanford’s MADALINE was the first neural net to tackle a real-world problem: canceling echo on phone lines. After the ‘quiet years’, during which artificial intelligence was given little attention, traditional models like these started to produce more exciting results such as recognizing basic images with image processing algorithms.
Soon, the idea of convolutional networks appeared around the 70s, before being more properly implemented in 1998, in the famous paper “Gradient-based learning applied to document recognition”, by Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. In the last decade, a variety of convolutional nets have been designed and become some of the most renowned models within the AI field, such as AlexNet (2012), ZF Net (2013), VGG Net (2014), GoogLeNet (2015) and Microsoft ResNet (2015). Convolutional nets are similar to normal neural nets and share many similar algorithms and ideas.
A look at the problem of data
Earlier, I mentioned that standard neural network models could handle ‘basic images’. When saying this, I mean images that were dramatically preprocessed so that all images conform to a particular set of standardized guidelines, giving any traditional neural network an easier learning curve. For example, if we looked at a preprocessed handwriting dataset, it could look like this:
Here, all visual elements are the same, except for the subject of the dataset.
In contrast, an unprocessed dataset may look like this:
Here, we have shadows, creases, lines, blurs, pencil instead of ink, different positioning, etc. These real-world imperfections make it much more difficult for neural networks to observe the things we actually want them to (i.e. shadows, creases, etc). How can a neural net know that a shadow is not meant to be a feature of a particular digit? If we want machine learning to really help us in the real world, we need models that work accurately in spite of these imperfections. As conditions previously stood, preprocessing was only helpful when looking at high resolution, three-channelled, real-world photos of cats and buildings.
There are two key factors that have managed to overcome these obstacles in the last few years:
- more computational power (GPUs – therefore bigger datasets)
- more intelligent models (convoluted models)
There is no doubt that the bigger the dataset, the more chances a neural network is given to realize that shadows are not in fact relevant to handwriting recognition. This is because the network observes many images where shadows appear in different classifications, and the target values never associate with a pattern of these shadows.
Convolutional neural networks are able to cope with these more realistic images due to spatial awareness. In the unprocessed images, we can see the digits are not perfectly centered. Without spatial awareness, the following identical digits would be considered completely different to a traditional neural network.
In real-world examples, however, objects may appear in many different locations of an image but still be the same object. This also pertains to objects that have different shapes, shadows and other characteristics that make similar objects differentiated. Unlike traditional neural networks, convolutional networks can see through these differences and focus on the pertinent qualities of the data.
In the next part of this series, I will take a closer look at how the challenge associated with advanced image identification and processing was overcome and why traditional methods couldn’t quite hack it.
If you are interested in learning a bit more about the history behind neural networks and artificial intelligence, take a look at my earlier series.