Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Neural Network 101. Chapter 2.
MS CS
Aug 31, 2024
175 views
We are going to be looking at a real example of property valuation.
We have independent variables i.e 1 row of a db that defines the property.
In this example of network, we have input and output layer, we don't have hidden layer.
Here we can use:
The Power is HIDDEN LAYER
Now let's understand how the hidden layer gives us the extra power?
Another neuron in the hidden layer could be picking 3 attributes like area, number of bedroom and age.
Some other neuron could just pick the age. Why?
Moreover Neural networks could pick up combinaton and permutation of the 4 features for example:
These neurons, this whole hidden layer allows neural network to look for very specific things and then in combination that'w where the power come from.
It's like the example of ants, just 1 ant can't build ant hill, but in the group of hundreds and thousands they can do anything.
Each of these neuron by itself cannot predict the price but together they have superpower and they predict the price.
They can do quite an accurate job if trained/set up properly.
How do Neural Networks learn?
There are two fundamental ways to getting a program to do what you want it to do.
How to create a network that learn on it's own?
How do you distinguish between dog and cat?
How the neural network works?
This is called a single layer feedforward neural network or perceptron.
why y^? instead of y?
y stands for actual value, in reality
y^ is the predicted value/output value from model.
Perceptron was first invented By Rosenblatt in 1957.
The whole idea was to invent something that can learn itself.
Let's see how perceptron learns?
In Order to be able to learn.
We need to compare the output value to the actual value that we want the network to get.
Comparing Y & Y^, we have a difference.
We will then calculate the:
Cost Function
Cost function basically tells us what is the error that you have in your prediction, and our goal is to minimize the cost function. Because lower the cost function, the closer y^ is to y.
Cost function = 1/2 (y^ - y)^2
Once we have the information of cost function, we are then going to feed this information back into the neural network.
It goes back and the weight gets updated. All we can do is to update the weight.
The very thing we have control in this very simple neural network are the weights.
Our goal is to minimize the cost functions, all we can do is to update the weights, tweak them little bit.
Right now throughout the experiment, we are dealing with only 1 row of data.
Again with the same row of the data, goes through multiple iteration until the cost function is adjusted to minimal & the weight gets updated.
What Happens with multiple row of data?
We calculate the y^ for
we have:
for every single row we have actual value i.e y
Now based on all the difference between y^ and y, we can calculate the cost function.
Which is sum of all of these 1/2(y^ - y) ^ 2
Since we have the full cost function, we will go back and update the weights, w1, w2, w3, wn.
We will iterate this process until the cost function is minimized, no matter how many rows we have.
The goal is to minimize the cost function.
This whole iteration and learning is called back propagation.
The question is How?
How can we minimize the cost function?
For this simple neural network
The neural network in the example has 25 weights in total:
This Neural network with just 30 weights, and we want to try 1000 different values for each weight.
Understanding 1000^30: 1000^30 is an enormously large number.
To give an idea:
Now, let's use the world's fastest supercomputer:
This mind-boggling amount of time shows why we can't just try every possible combination of weights in a neural network. It's why we need smarter methods, like gradient descent, to train neural networks efficiently.
In essence, as we add more dimensions (weights in this case), the problem becomes exponentially more complex - that's the curse of dimensionality in action!
Gradient Descent
How it works:
This method is crucial in machine learning as it provides an efficient way to optimize complex models with many parameters, like neural networks, without having to exhaustively search all possible combinations.