Visit Refactored to play with the interactive network. Artificial neural networks are abstractly inspired by the basic computational units of the brain, neurons. State-of-the-art solutions to AI problems including image recognition, natural language processing, simulating human creativity in the arts, and outperforming humans in complex board games are backed by artificial neural networks. Instead of being hard-encoded with rules, artificial neural networks learn a task from training examples. For instance in visual pattern recognition, to recognize images of a particular object, like the number “9”, programs are confronted with the difficulty of considerable variability: flourishes in handwriting style, lighting of the surrounding context, changes in orientation and size. To write precise rules would necessitate limitless exceptions. Artificial neural networks approach the problem by learning to infer rules from training examples. Here we will outline the mechanics of a simple neural network and illustrate the property of hierarchical feature detection.

Perceptron

Biological inspiration operates by communicating via electrical impulses. Each biological neuron consists of a cell body (which generates electrical signals), dendrites (which receive signals), and an axon (which sends signals) (figure 0). Simply, a neuron is electrically excited by input at its dendrites. That electrical input comes from another neuron or numerous neurons. If the input is strong enough, then the neuron is “activated” and fires its electrical output down its axon. That axon is perhaps connected to the dendrites of other neurons. In this way, electrical signals are transmitted across networks of neurons. In artificial neural networks, neurons are represented by nodes (red circles) and connections to other neurons are represented by edges (blue lines) (figure 1). The simplest and earliest model is the perceptron, which also is the fundamental building block of more complex neural networks. A perceptron takes binary inputs,

and outputs a single binary output (figure 2).

How is the output computed? Each input is multiplied by its respective weight,

The weight determines the relative influence of each input (the greater the weight, the greater the influence on the output). Then every input multiplied by its weight is summed together:

The perceptron’s output is determined by whether this weighted sum of all its inputs exceeds some threshold value:

If we think of 0 as “off” and 1 as “on” then this threshold function effectively determines whether the perceptron is turned on or off. This function is aptly named the activation function since it determines if the perceptron is activated (switched on) or not. Activation is like overcoming a barrier or filling a cup. Only when the cup overflows or the barrier is crossed will the switch be activated. In biology, different neurons need different levels of input stimulation before they will fire an electrical output. Some neurons need very little whereas others need a lot to be activated.

Simplifying, if we consider all the weights and inputs as vectors:

then the transpose of the weight matrix W multiplied by the input matrix X is the same as the weighted sum:

Furthermore, if we let the bias = -threshold. Then the equivalent rule for the perceptron becomes (figure 4):

This formulation of the perceptron is the most commonly used in the literature.

Activation Functions

The most frequently used activation functions are the sigmoid, tanh, and reLU. How much does the introduction of non-linearity enable a neural network to represent? As is proven, neural networks with at least one hidden layer are universal approximators, which means they can represent any continuous function (see Approximation by Superpositions of a Sigmoidal Function and a visual proof). In diagrams, each activation function is represented by one of the following symbols:

See Figure 5 to determine which symbol corresponds to which activation function.

Multilayer Network

The perceptron is a single-layer neural network in which inputs are directly summed and activated to produce an output. Deep neural networks have a multilayer structure where neurons are organized into distinct layers. Two adjacent layers may have any combination of connections between them (with a full set of connections being common), but no two neurons in the same layer are connected (figure 6).

This layered organization may enable hierarchies of feature detectors. The toy visual recognition animation featured above exemplifies this hierarchical grouping of learning more complex composite features in deeper layers. Earlier layers detect simpler features. The first hidden layer recognizes single pixels. The second hidden layer recognizes composites of two pixels. The third recognizes composites of 4 pixels and the fourth layer recognizes the negation of layer three. This deep architecture wherein lower layers recognize simpler features which feed into higher layers that detect more complex features (such as a progression from line edge orientations, to contours, to fully developed outlines) is a breakthrough in visual recognition: Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations.

To illustrate these concepts, let’s look at a concrete example and tease apart the animation that opens this article. Here it is diagrammed (figure 7):

Figure 7: Diagram of the opening animation

The input is a toy “image” that is made of 4 pixels. The input images feature a simple line in red which can be oriented in any of the following ways:

The task of the neural network is, given an input image of a line, to correctly classify the orientation of that line. This is a highly simplified example of visual pattern recognition.

In the input layer of the neural network (figure 7), 4 nodes examine the 4 pixels: one node is associated with each pixel of the input image. The 4-square next to each node indicates which pixel each node examines.

If in the input image, the pixel is red, the node registers a value of -1. If the pixel is white, the node registers 1. So in this representation, red is the negation of white, or another way to see it is that the line is the opposite of the background.

Hidden Layer 1

Hidden Layer 1 exemplifies the first instance of hierarchical feature representation, meaning deeper layers combine simpler earlier layers to recognize more complex features. Whereas the input layer only recognized one pixel at a time, Hidden Layer 1 combines the single pixels from the input into combinations of 2 pixels (figure 8). The edges that feed into the nodes of Hidden Layer 1 determine which two pixels are combined. gg

Activation Functions

For hidden layers 1 and 2, the activation function lets the signal pass through so long as the value is not zero. Zero effectively switches off the node and the signal dies. Otherwise, any other value keeps the switch on. Figure 10 shows where zero flips off the switch. Hidden layer 3 uses the reLU activation function, which turns all negative values into zero and leaves positive values unchanged.

This toy neural network illustrates the basic mechanics of neural networks (activation functions acting on the weighted summation of inputs) and hierarchical feature representations in which deeper layers in the network allow for compositions of features developed in earlier layers that hence enable more sophisticated processing.

Citations

1. CS231n Convolutional Neural Networks for Visual Recognition
2. Neural Networks and Deep Learning
3. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations
4. Approximation by Superpositions of a Sigmoidal Function
5. A visual proof that neural nets can compute any function
6. How Deep Neural Networks Work