# Restricted Boltzmann Machines — Simplified (2021)

Ajay Ohri
Share

## Introduction

Boltzmann Machines are EBMs- Energy-Based Models represented by the Gibbs or Boltzmann distribution forming a part of Statistical Mechanics, which helps one understand Thermodynamics topics like the parametric impact of temperature and entropy on quantum states. In 1985, Professor Geoffrey Hinton of Carnegie Mellon University with Professor Terry Sejnowski of Johns Hopkins University discovered them.

## 1) What are Boltzmann Machines?

The Restricted Boltzmann machine is a generative neural network that is stochastic with the capability and capable of learning via internal representations. When provided time, it can solve and represent difficult issues with a combination of statistical and mechanical problems, making it a Discriminative Restricted Boltzmann Machine.

## 2) How do Boltzmann Machines work?

Restricted Boltzmann machines are stochastic, meaning they are non-deterministic and have two types of nodes which are generative and Deep Learning models. The visible or hidden nodes are the ONLY kinds of nodes, and output nodes are not present, giving them the feature of being non-deterministic. Since they do not have output nodes or typical outputs like o or 1, they are different from other machines that use the Stochastic Gradient Descent to optimize and learn from the output patterns represented by the states of 0 or 1. These machines, called Conditional Restricted Boltzmann Machine, have the learning capability despite not having output nodes.

A further difference is that the input nodes are not connected in the A/R/C type of machines, whereas in the Boltzmann Machine, its input nodes are connected immaterial of the nodes being visible or hidden. This connection allows them to self-generate through inter-nodal information sharing since all nodes are connected in a network of nodes. Measurements are taken only at the visible nodes, and when provided with an input, the machine captures all patterns, parameters, and correlation features of data.  Hence the Boltzmann Machine is an Unsupervised Deep Learning Deep Boltzmann machine class that works on the Deep Generative Model of machines.

## 3) How do Restricted Boltzmann Machines work?

RBM full form is Restricted Boltzmann Machine and has generative capabilities with an artificial neural network comprising of two layers of a Restricted Boltzmann Machine. They use the input sets to self-learn the probability distribution.

RBMs are typically used in neural networks for classification, dimensionality reduction, regression, feature learning, collaborative filtering, topic modeling, etc. They are thus Boltzmann Machines forming a special class of machines with a restricted number of connections in their network of hidden and visible nodes. This feature is crucial for easy implementation and makes the Restricted Boltzmann Machine example much better than the Boltzmann Machines.

The two layers, called the hidden and visible layers, use bipartite full graphs for their connections, meaning the same group nodes are not connected. Still, every visible layer node is connected to every hidden layer node applying a restrictive communication feature.  It is this restriction that permits the training of algorithms efficiently. In Boltzmann machines, the general machines work on the contrastive-divergence algorithm that is gradient-based.

Restricted Boltzmann machines also have a bipartite graph that is symmetric and in which no two same group node units are connected. Since multiple RBMs can be easily fine-tuned through a stack using the processes of back-propagation and gradient descent, the stack is amenable to learning and is also called a DBN- Deep Belief Network. With quick advancements in the Variation Auto-encoders and General-Adversarial Networks, the use of RBMs is slowly declining in the community using deep-learning neural networks.

The RBM is also a Neural Network that is stochastic, meaning when activated, each of the neurons indicates random behaviour. Unlike the auto-encoders, the RBMs have 2 bias units corresponding to the 2 bias (visible/hidden) layer units. The forward pass activation is provided by the hidden bias while the backward pass uses the visible bias to reconstruct the RBM input (which is always different from the values of actual input) since no connections exist among the visible units meaning they have no way of information transfer among themselves.

When training a Boltzmann machine with multiple inputs, the bias is added with the weighted inputs and passed through activation in a sigmoid function. The output of the function will decide whether the hidden layer’s state becomes active or not. The weights matrix is determined using the number of rows as the number of input nodes and the number of columns being the number of hidden nodes. Hence, the input at the first hidden node is a vector input where the first column of weights is multiplied by the multiplication of the inputs and then the adding of the corresponding bias term in a Fuzzy Restricted Boltzmann Machine for the enhancement of deep learning.

## 4) The learning process

Take the visible inputs as represented by v, the hidden inputs as h and the weights represented as w. The reconstruction error between the states is considered as the function of v(0)-v(1), which in the training process is reduced by the subsequent iterative steps. Iterations are affected with adjusted weights for error minimization, creating a robust learning process and contrastive divergence algorithm.

Without delving into mathematics, the forward pass is a function calculating the probability of h(1) output and is got from the weighted input v(0) and the weights W. In the backward pass, the input is reconstructed, and the probability of output v(1)output is calculated from input h(1) input and W or weights. The weights are again a function of v(1), h(1), and W. Since only the v(0) value has changed, and the W remains the same for both backward and forward passes, the probabilities of the Restricted Boltzmann machine recommendation system’s two conditions provides the joint distribution of activations and inputs as a function of the hidden and visible inputs.

Restricted Boltzmann machine uses the reconstruction technique quite different from classification or regression, which associate the input to discrete and continuous values. Instead, RBMs estimate the probability distribution of multiple values of original input at any given instant in an example of generative learning. In classification/ regression, mapping of the labels and inputs makes the learning discriminative.

## 5) Contrastive Divergence

The Energy-based Restricted Boltzmann machine models use a sigmoid energy function described as a joint configuration hidden and visible units denoted as (v,h). If j is the hidden unit and i the visible unit, one can use hjvi, to represent the binary visible states of the node unit. The probability equation also accounts for the bias represented by ai, bj and wij being the inter-unit weight.

Thus the probability of a visible vector being assigned to the network gives the relational sigmoidal functions of all possibilities in the hidden vectors. If Z is considered as a partition sigmoid function providing the summing of vectors over all possible hidden and visible pairs of vectors, the derivative of the training vector’s log probability or the training vector’s log-likelihood with respect to w the weight is easily obtained.

Please refer to Guide on training RBM by Geoffrey Hinton to understand this better. Since there are no hidden units that are directly connected in an RBM, getting unbiased ⟨vi hj⟩data samples is easy. The unbiased Restricted Boltzmann Machine keras model sample values are hard to obtain since the distribution of energy needs to reach a stationary value at equilibrium for such calculations using the Markov chain, which approximates the 2nd term.

Gibbs Sampling of the distribution is performed, which uses the MCMC (a Markov chain Monte Carlo) algorithm for observational sequencing approximated from the direct-sampling probability distribution of specified multi-variables. This approximates the training data’s approximate log probability gradient, albeit crudely. Learning rules are associated with the approximate gradients of an objective function known as Contrast Divergence.  This function is defined by the difference between two divergences defined by the Kullback-Liebler equations used in the algorithm with the second term obtained by Gibbs Sampling over k steps.

## Conclusion

The article has a tutorial on the architecture of the simple Restricted Boltzmann machine, which offers many improvements and variations of RBMs based on algorithmic optimization and training.

If you are interested in making a career in the Data Science domain, our 11-month in-person Postgraduate Certificate Diploma in Data Science course can help you immensely in becoming a successful Data Science professional.