Demystifying Neural Networks: Solving the XOR Problem with Backpropagation by Rajeshwar Vempaty

If the input is the same(0,0 or 1,1), then the output will be 0. The points when plotted in the x-y plane on the right gives us the information that they are not linearly separable like in the case of OR and AND gates(at least in two dimensions). Machine learning models learn from data and make predictions. One of the fundamental concepts behind training these models is backpropagation.

Moving on to XOR gate

He is a Quality Analyst by profession and have 15 years of experience. SGD works well for shallow networks and for our XOR example we can use sgd. The selection of suitable optimization strategy is a matter of experience, personal liking and comparison. Keras by default uses “adam” optimizer, so we have also used the same in our solution of XOR and it works well for us. As we move downwards the line, the classification (a real number) increases. When we stops at the collapsed points, we have classification equalling 1.

Hidden layer gradient

Backpropagation is a way to update the weights and biases of a model starting from the output layer all the way to the beginning. The main principle behind it is that each parameter changes in proportion to how much it affects the network’s output. A weight that has barely any effect on the output of the model will show a very small change, while one that has a large negative xor neural network impact will change drastically to improve the model’s prediction power. Remember the linear activation function we used on the output node of our perceptron model? You may have heard of the sigmoid and the tanh functions, which are some of the most popular non-linear activation functions. One way to solve the XOR problem is by using feedforward neural networks.

Loss function and cost function

Further, this error is divided by 2, to make it easier to differentiate, as we’ll see in the following steps. We’ll initialize our weights and expected outputs as per the truth table of XOR. Out of all the 2 input logic gates, the XOR and XNOR gates are the only ones that are not linearly-separable. Apart from the usual visualization ( matplotliband seaborn) and numerical libraries (numpy), we’ll use cycle from itertools . This is done since our algorithm cycles through our data indefinitely until it manages to correctly classify the entire training data without any mistakes in the middle. We get our new weights by simply incrementing our original weights with the computed gradients multiplied by the learning rate.

However, it’s important to note that CNNs are designed for tasks like image recognition where there is spatial correlation between pixels. For simple problems like XOR problem, traditional feedforward neural networks are more suitable. This problem may seem easy to solve manually, but it poses a challenge for traditional neural networks because they lack the ability to capture non-linear relationships between input variables. Neural networks have been proven to solve complex problems, and one of the most challenging ones is the XOR problem.

However, is it fair to assign different error values for the same amount of error? For example, the absolute difference between -1 and 0 & 1 and 0 is the same, however the above formula would sway things negatively for the outcome that predicted -1. To solve this problem, we use square error loss.(Note modulus is not used, as it makes it harder to differentiate).

When I started AI, I remember one of the first examples I watched working was MNIST(or CIFAR10, I don’t remember very well). Looking for online tutorials, this example appears over and over, so I suppose it https://forexhero.info/ is a common practice to start DL courses with such idea. That is why I would like to “start” with a different example. A converged result should have hyperplanes that separate the True and False values.

This creates problems with the practicality of the mathematics (talk to any derivatives trader about the problems in hedging barrier options at the money). Thus we tend to use a smooth functions, the sigmoid, which is infinitely differentiable, allowing us to easily do calculus with our model. Hidden layers are those layers with nodes other than the input and output nodes. Non-linearity allows for more complex decision boundaries. One potential decision boundary for our XOR data could look like this.

Multi-layer feedforward neural networks, also known as deep neural networks, are artificial neural networks that have more than one hidden layer.
I hope that the mathematical explanation of neural network along with its coding in Python will help other readers understand the working of a neural network.
And it could be dealt with the same approaches described above.

In this process, we have also learned how to create a Multi-Layer Perceptron and we will continue to learn more about those in our upcoming post. Before we end this post, let’s do a quick recap of what we learned today. Again, we just change the y data, and all the other steps will be the same as for the last two models. One solution for the XOR problem is by extending the feature space and using a more non-linear feature approach. However, we must understand how we can solve the XOR problem using the traditional linear approach as well.

To overcome this limitation, we need to use multi-layer feedforward networks. And now let’s run all this code, which will train the neural network and calculate the error between the actual values of the XOR function and the received data after the neural network is running. The closer the resulting value is to 0 and 1, the more accurately the neural network solves the problem. This multi-later ‘perceptron’ has two hidden layers represented by \(h_1\) and \(h_2\), where \(h(x)\) is a non-linear feature of \(x\). One simple approach is to set all weights to 0 initially, but in this case network will behave like a linear model as the gradient of loss w.r.t. all weights will be same in each layer respectively. It will make network symmetric and thus the neural network looses it’s advantages of being able to map non linearity and behaves much like a linear model.

Moving on to XOR gate

Hidden layer gradient

Loss function and cost function

Leave a Comment Cancel Reply