Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The Battle of Activation Functions. Neural Network 101. Chapter 3
Prashant Basnet
Sep 23, 2024
152 views
Focusing on the mathematical foundations of neural network activation functions. We'll start with simple rules Transformations on functions, then explore odd, even, sigmoidal & hyperbolic tangent functions.
Simple transformations on functions
Simple Transformations on Functions: The note discusses basic transformations that can be applied to functions:
First, let's address the transformations rules :
Let's see visually how things changes by applying this rule:
Now, let's look at the exponential probability density function:
p(x; λ) = 1x≥0 · λe^(-λx)
here, λ (lambda) is used in two places:
What does it mean to "decay"?
from the transformation rule: y = f(ax), where a > 1
In our case, f(x) = e^(-x), and we're looking at f(λx) = e^(-λx)
When λ > 1, it acts as a horizontal compression factor:
The effect on decay rate:
In this graph:
Even Functions:
A function is even if & only if f(x) = f(-x) for all x in the domain of f. Geometrical interpretation of even functions are symmetric about the y-axis. For example cos(x) is an even function
other example of even function:
Now, the same function flipped about the y-axis:
Odd Functions:
A function f(x) is odd if & only if f(-x) = -f(x) for all x in the domain of f.
Original Function:
Now, let's flip it about the x-axis:
Finally, let's flip it about the y-axis:
meaning we mirror the right part to the left & left to right.
Sigmoidal functions:
These are S-shaped curves used as activation functions in neural networks. They help introduce non-linearity, allowing networks to learn complex patterns.
1. Logistic Sigmoid Functions:
This is how the sigmoidal function looks like:
2. Hyperbolic Tangent (tanh) Function:
This function is very similar to the sigmoidal function, as it's graph is also S shaped.
These graphs highlight the key difference between two functions, particularly their ranges & centers. The sigmoidal function is useful when you need outputs between 0 & 1 like probabilities, while tanh is often preferred in neural network due to it's zero centered nature, which can help with faster convergence during training.
What is convergence anyways?
Convergence in neural network training, we are referring to how quickly & efficiently the network learns to minimise its error or loss function. Faster convergence means the network reaches its optimal performance in fewer training iterations.
Also why this zero-centering matters?
In neural network, the output of one layer becomes the input to the next layer.
When these values are centered around zero, it helps in the following ways.
Relationship between logistic sigmoid & tanh:
In this graph:
Notice, how the tanh derivative is steeper near x = 0 compared to the sigmoid derivate.
This steeper gradient often translates to faster learning in the critical region around zero. While tanh can often lead to faster convergence, modern neural networks frequently use other activation function like Relu(Rectified Linear Unit) or it's variant, which can offer even better performance in many scenarios.
We can say, tanh is a rescaled & shifted version of logistic sigmoid.
Let's break it down mathematically:
Logistic sigmoid function: σ(x) = 1 / (1 + e^(-x))
Hyperbolic tangent function: tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))
tanh(x/2) = 2 * σ(x) - 1
in next chapter we will cover more on other activation functions.
#NeuralNetworks101 #ActivationFunctions #MachineLearning #DeepLearning #SigmoidFunction #TanhFunction #MathForAI #DataScience #ArtificialIntelligence #ComputerScience