LSTM hidden layer computation #4

christopher5106 · 2016-09-21T21:35:33Z

Hi,
I'm just wondering why you use this form in https://github.com/coreylynch/grid-lstm/blob/master/model/GridLSTM.lua#L31

local next_h = nn.CMulTable()({out_gate, nn.Tanh()(next_c)})

in the paper it is

local next_h = nn.Tanh()(nn.CMulTable()({out_gate, next_c}))

Thank you for your response

christopher5106 · 2016-09-23T14:36:45Z

I also have a question about weight sharing: for the time LSTM in your example, weights are not shared between layers (they are shared in time only, thanks to clones) while for the depth LSTM, weights are shared between layers and time. This makes a lot a sense, in fact.

But it surprised me at first read because the "tied N-LSTM" is, by definition, sharing weight along all dimensions.

Either

NOT cloning weights of the depth LSTM in times, or
share also the weights of the time LSTM in depth
would be more coherent... do you have any idea also ?

Thanks,

ytoon · 2016-12-16T06:22:38Z

I think the paper said the same weight for the time and depth of LSTM. You can refer to the paper 4.3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSTM hidden layer computation #4

LSTM hidden layer computation #4

christopher5106 commented Sep 21, 2016

christopher5106 commented Sep 23, 2016

ytoon commented Dec 16, 2016

LSTM hidden layer computation #4

LSTM hidden layer computation #4

Comments

christopher5106 commented Sep 21, 2016

christopher5106 commented Sep 23, 2016

ytoon commented Dec 16, 2016