Exploding Gradient Problem
Exploding Gradient Problem
What & How?
The problem is that during the backpropagation traversing from the final layer to the initial layer for updating the weights in the large neural network the n derivative values are multiplied. When the derivative value is large then the gradient value exponentially increases and makes the model more unstable. This is the Exploding Gradient Problem. During the backpropagation, the large change in the weights makes the model very unstable.
Solutions?
- Weight Initialization.
- Reducing the number of layers.
- Gradient Clipping
Weight Initialization?
A careful initialization of weights should be assigned during backpropagation. This can be achieved using Random initialization.
Reduce the number of layers?
By reducing the number of hidden layers, the problem of exploding and vanishing gradient problem can be solved.
Gradient Clipping?
Gradient Clipping is a technique that tackles the problem of exploding gradient problem. The idea is very simple, that if the gradient value is very large, this rescales to value to small, if ||g|| > c, then
where, g = gradient
c = hyperparameter
||g|| = norm of gradient
since g/||g|| is a unit vector, after rescaling the new g will have norm c. Note that if ||g|| < c, then we don’t need to do anything.
Comments
Post a Comment