<Deeplarning> A simple way to distinguish different optimizers in DeepLearning
The optimizers
Let’s simply list the optimizers that may or may not be used in your project.
- SGDOptimizer
- MomentumOptimizer
- NesterovOptimizer
- AdagradOptimizer
- AdadeltaOptimizer
- RMSPropOptimizer
- AdamOptimizer
- NadamOptimizer
Four type of these optimizers
We can categorize these optimizers into Four types:
- Type1: Base type: SGDOptimizer
- Type2: Add Momentum to gradient(Base on the historical gradient): MomentumOptimizer, AdamOptimizer, NadamOptimizer
- Type3: Change LearningRate accordingly(According to the changing speed(second derivative of the gradient) of each variable): AdagradOptimizer, Adadelta, RMSPropOptimizer, NadamOptimizer
- Type4: Change direction of the gradient(Weight firstly update by the historical gradient, and then compute the new gradient): NesterovOptimizer, NadamOptimizer
Let make some equations
- SGDOptimizer = T1
- MomentumOptimizer = T1 + T2
- NesterovOptimizer = T1 + T4
- AdagradOptimizer = T1 + T3
- Adadelta = T1 + T3(Add window on the squared gradient)
- RMSPropOptimizer = T1 + T3(Add window on the squared gradient)
- AdamOptimizer = T1 + T3(Add window on the squared gradient. And add window on the gradient(momentum) itself)
- NadamOptimizer = T1 + T3 + T4
<Deeplarning> A simple way to distinguish different optimizers in DeepLearning