ELU-Networks - Fast and Accurate CNN Learning on ImageNet
Sprache des Titels:
We trained a CNN on the ImageNet dataset with a new activation function, called "exponential linear unit" (ELU), to speed up learning.
Like rectified linear units (ReLUs), leaky ReLUs (LReLUs) and parametrized ReLUs (PReLUs), ELUs also avoid a vanishing gradient via the identity for positive values. However ELUs have improved learning characteristics compared to the other activation functions. In contrast to ReLUs, ELUs have negative values which allows them to push mean unit activations closer to zero. Zero means speed up learning because they bring the gradient closer to the unit natural gradient. Like batch normalization, ELUs push the mean towards zero, but with a significantly smaller computational footprint. While other activation functions like LReLUs and PReLUs also have negative values, they do not ensure a noise-robust deactivation state. ELUs saturate to a negative value with smaller inputs and thereby decrease the propagated variation and information. Therefore ELUs code the degree of presence of particular phenomena in the input, while they do not quantitatively model the degree of their absence. Consequently dependencies between ELU units are much easier to model and distinct concepts are less likely to interfere.
In the ImageNet challenge ELU networks considerably speed up learning compared to a ReLU network with similar classification performance. We submitted to ILSVRC 2015 achieving 9.18% test classification error rate.