< Network > Understanding the activation style in residual block

The evolution of the Activation Style in Residual Blocks

The Activation Style is different between different residual structures.

  • Resnet_v1
    After shortcut: Activation
    After residual : No Activation
    After add: Activation

  • Resnet_v2
    After shortcut: No Activation
    After residual: No Activation
    After add: Activation

  • Mobilenet_v2
    After shortcut: No Activation
    After residual: No Activation
    After add: No Activation

The thought behind the changing of Activation Style in Residual Blocks

The main novelty of the residual structure is to add the shortcut and the residual together.

In the original version of Resnet,Activation is placed before and after the additive action for nonlinear transformation.

However, the additive action is also a special kind of nonlinear transformation for both shortcut and residual. So there is no need to use activation before and after additive action.

On the other hand, if we use the common “Relu” for activation after both residual and shortcut, then more useful information could be lost because the “Relu” function is a truncation function.

So I think it is wisely to avoid using the activation before and after the additive action in the residual structure.

< Network > Understanding the activation style in residual block

https://zhengtq.github.io/2018/12/14/newwork/

Author

Billy

Posted on

2018-12-14

Updated on

2021-03-13

Licensed under

Comments