<Deeplarning> A simple way to distinguish different optimizers in DeepLearning

The optimizers

Let’s simply list the optimizers that may or may not be used in your project.

  • SGDOptimizer
  • MomentumOptimizer
  • NesterovOptimizer
  • AdagradOptimizer
  • AdadeltaOptimizer
  • RMSPropOptimizer
  • AdamOptimizer
  • NadamOptimizer

    Four type of these optimizers

We can categorize these optimizers into Four types:

  • Type1: Base type: SGDOptimizer
  • Type2: Add Momentum to gradient(Base on the historical gradient): MomentumOptimizer, AdamOptimizer, NadamOptimizer
  • Type3: Change LearningRate accordingly(According to the changing speed(second derivative of the gradient) of each variable): AdagradOptimizer, Adadelta, RMSPropOptimizer, NadamOptimizer
  • Type4: Change direction of the gradient(Weight firstly update by the historical gradient, and then compute the new gradient): NesterovOptimizer, NadamOptimizer

Let make some equations

  • SGDOptimizer = T1
  • MomentumOptimizer = T1 + T2
  • NesterovOptimizer = T1 + T4
  • AdagradOptimizer = T1 + T3
  • Adadelta = T1 + T3(Add window on the squared gradient)
  • RMSPropOptimizer = T1 + T3(Add window on the squared gradient)
  • AdamOptimizer = T1 + T3(Add window on the squared gradient. And add window on the gradient(momentum) itself)
  • NadamOptimizer = T1 + T3 + T4

<Deeplarning> A simple way to distinguish different optimizers in DeepLearning

https://zhengtq.github.io/2019/04/24/op-summary/

Author

Billy

Posted on

2019-04-24

Updated on

2021-03-13

Licensed under

Comments