Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] SELU weights and dropout #1

Open
pnmartinez opened this issue Apr 27, 2021 · 1 comment
Open

[Question] SELU weights and dropout #1

pnmartinez opened this issue Apr 27, 2021 · 1 comment

Comments

@pnmartinez
Copy link

Hi,

My name is Pablo Navarro. Your team and I have already exchanged a few mails over the wonderful paper you've made. Thanks again for the contribution.

Now that the code is released, I have a couple question over the implementation of the SELU activation function.

Weight init

For SELU, you force lecun_normal which is in turn a pass on the init_weights() function:

def init_weights(module, initialization):
    if type(module) == t.nn.Linear:
        if initialization == 'orthogonal':
            t.nn.init.orthogonal_(module.weight)
        elif initialization == 'he_uniform':
            t.nn.init.kaiming_uniform_(module.weight)
        elif initialization == 'he_normal':
            t.nn.init.kaiming_normal_(module.weight)
        elif initialization == 'glorot_uniform':
            t.nn.init.xavier_uniform_(module.weight)
        elif initialization == 'glorot_normal':
            t.nn.init.xavier_normal_(module.weight)
        elif initialization == 'lecun_normal':
            pass
        else:
            assert 1<0, f'Initialization {initialization} not found'

How come the weights are initialized as lecun_normal simply by passing? On my machine, default PyTorch initializes weights uniformly, not normally.

DropOut on SELU

I believe that in order to make SELU useful, you need to use AlphaDropout() instead of regular DropOut() layers (PyTorch docs).

I can't find anything wrapping AlphaDropOut() in your code. Can you point me in the right direction or give the rationale behind it?

Cheers and keep up the good work!

@pnmartinez pnmartinez changed the title [Question] SELU activation function weights and dropout [Question] SELU weights and dropout Apr 27, 2021
@kdgutier
Copy link
Collaborator

DropOut and AlphaDropOut on SELU

Thanks for the comments.
As you mentioned from the paper of the scaled exponential linear units https://arxiv.org/abs/1706.02515, on page 6, they recommend not use dropout as the extra variance hinders the convergence of the algorithm when using normalization.
We observed some convergence issues when exploring the hyperparameter space. Although with optimal model configurations, the training procedure was stable.

One thing to keep in mind is that the two best regularization techniques we found in our experiments are early stopping and second ensembling. Since ensembling boosts accuracy from the diversity and variance of models, the interaction of AlphaDropOut with the ensemble might be something interesting to explore. Still, we will try the AlphaDropOut regularization to test the SELU paper recommendation on this regression setting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants