< Network > Understanding the activation style in residual block
The evolution of the Activation Style in Residual Blocks
The Activation Style is different between different residual structures.
Resnet_v1
After shortcut: Activation
After residual : No Activation
After add: ActivationResnet_v2
After shortcut: No Activation
After residual: No Activation
After add: ActivationMobilenet_v2
After shortcut: No Activation
After residual: No Activation
After add: No Activation
The thought behind the changing of Activation Style in Residual Blocks
The main novelty of the residual structure is to add the shortcut and the residual together.
In the original version of Resnet,Activation is placed before and after the additive action for nonlinear transformation.
However, the additive action is also a special kind of nonlinear transformation for both shortcut and residual. So there is no need to use activation before and after additive action.
On the other hand, if we use the common “Relu” for activation after both residual and shortcut, then more useful information could be lost because the “Relu” function is a truncation function.
So I think it is wisely to avoid using the activation before and after the additive action in the residual structure.
< Network > Understanding the activation style in residual block