Cells
RecurrentLayers.RANCell
— TypeRANCell((input_size => hidden_size)::Pair;
+ init_kernel = glorot_uniform,
+ init_recurrent_kernel = glorot_uniform,
+ bias = true)
The RANCell
, introduced in this paper, is a recurrent cell layer which provides additional memory through the use of gates.
See RAN
for a layer that processes entire sequences.
Arguments
input_size => hidden_size
: input and inner dimension of the layerinit_kernel
: initializer for the input to hidden weightsinit_recurrent_kernel
: initializer for the hidden to hidden weightsbias
: include a bias or not. Default istrue
Equations
\[\begin{aligned} +\tilde{c}_t &= W_c x_t, \\ +i_t &= \sigma(W_i x_t + U_i h_{t-1} + b_i), \\ +f_t &= \sigma(W_f x_t + U_f h_{t-1} + b_f), \\ +c_t &= i_t \odot \tilde{c}_t + f_t \odot c_{t-1}, \\ +h_t &= g(c_t) +\end{aligned}\]
Forward
rancell(inp, (state, cstate))
+rancell(inp)
Arguments
inp
: The input to the rancell. It should be a vector of sizeinput_size
or a matrix of sizeinput_size x batch_size
.(state, cstate)
: A tuple containing the hidden and cell states of the RANCell. They should be vectors of sizehidden_size
or matrices of sizehidden_size x batch_size
. If not provided, they are assumed to be vectors of zeros, initialized byFlux.initialstates
.
Returns
- A tuple
(output, state)
, whereoutput = new_state
is the new hidden state andstate = (new_state, new_cstate)
is the new hidden and cell state. They are tensors of sizehidden_size
orhidden_size x batch_size
.
RecurrentLayers.IndRNNCell
— TypeIndRNNCell((input_size => hidden_size), σ=relu;
+ init_kernel = glorot_uniform,
+ init_recurrent_kernel = glorot_uniform,
+ bias = true)
Independently recurrent cell. See IndRNN
for a layer that processes entire sequences.
Arguments
input_size => hidden_size
: input and inner dimension of the layerσ
: activation function. Default istanh
init_kernel
: initializer for the input to hidden weightsinit_recurrent_kernel
: initializer for the hidden to hidden weightsbias
: include a bias or not. Default istrue
Equations
\[\mathbf{h}_{t} = \sigma(\mathbf{W} \mathbf{x}_t + \mathbf{u} \odot \mathbf{h}_{t-1} + \mathbf{b})\]
Forward
indrnncell(inp, state)
+indrnncell(inp)
Arguments
inp
: The input to the indrnncell. It should be a vector of sizeinput_size
or a matrix of sizeinput_size x batch_size
.state
: The hidden state of the IndRNNCell. It should be a vector of sizehidden_size
or a matrix of sizehidden_size x batch_size
. If not provided, it is assumed to be a vector of zeros, initialized byFlux.initialstates
.
Returns
- A tuple
(output, state)
, where both elements are given by the updated statenew_state
, a tensor of sizehidden_size
orhidden_size x batch_size
.
RecurrentLayers.LightRUCell
— TypeLightRUCell((input_size => hidden_size);
+ init_kernel = glorot_uniform,
+ init_recurrent_kernel = glorot_uniform,
+ bias = true)
Light recurrent unit. See LightRU
for a layer that processes entire sequences.
Arguments
input_size => hidden_size
: input and inner dimension of the layerinit_kernel
: initializer for the input to hidden weightsinit_recurrent_kernel
: initializer for the hidden to hidden weightsbias
: include a bias or not. Default istrue
Equations
\[\begin{aligned} +\tilde{h}_t &= \tanh(W_h x_t), \\ +f_t &= \delta(W_f x_t + U_f h_{t-1} + b_f), \\ +h_t &= (1 - f_t) \odot h_{t-1} + f_t \odot \tilde{h}_t. +\end{aligned}\]
Forward
lightrucell(inp, state)
+lightrucell(inp)
Arguments
inp
: The input to the lightrucell. It should be a vector of sizeinput_size
or a matrix of sizeinput_size x batch_size
.state
: The hidden state of the LightRUCell. It should be a vector of sizehidden_size
or a matrix of sizehidden_size x batch_size
. If not provided, it is assumed to be a vector of zeros, initialized byFlux.initialstates
.
Returns
- A tuple
(output, state)
, where both elements are given by the updated statenew_state
, a tensor of sizehidden_size
orhidden_size x batch_size
.
RecurrentLayers.LiGRUCell
— TypeLiGRUCell((input_size => hidden_size);
+ init_kernel = glorot_uniform,
+ init_recurrent_kernel = glorot_uniform,
+ bias = true)
Light gated recurrent unit. The implementation does not include the batch normalization as described in the original paper. See LiGRU
for a layer that processes entire sequences.
Arguments
input_size => hidden_size
: input and inner dimension of the layerinit_kernel
: initializer for the input to hidden weightsinit_recurrent_kernel
: initializer for the hidden to hidden weightsbias
: include a bias or not. Default istrue
Equations
\[\begin{aligned} +z_t &= \sigma(W_z x_t + U_z h_{t-1}), \\ +\tilde{h}_t &= \text{ReLU}(W_h x_t + U_h h_{t-1}), \\ +h_t &= z_t \odot h_{t-1} + (1 - z_t) \odot \tilde{h}_t +\end{aligned}\]
Forward
ligrucell(inp, state)
+ligrucell(inp)
Arguments
inp
: The input to the ligrucell. It should be a vector of sizeinput_size
or a matrix of sizeinput_size x batch_size
.state
: The hidden state of the LiGRUCell. It should be a vector of sizehidden_size
or a matrix of sizehidden_size x batch_size
. If not provided, it is assumed to be a vector of zeros, initialized byFlux.initialstates
.
Returns
- A tuple
(output, state)
, where both elements are given by the updated statenew_state
, a tensor of sizehidden_size
orhidden_size x batch_size
.
RecurrentLayers.MGUCell
— TypeMGUCell((input_size => hidden_size);
+ init_kernel = glorot_uniform,
+ init_recurrent_kernel = glorot_uniform,
+ bias = true)
Minimal gated unit. See MGU
for a layer that processes entire sequences.
Arguments
input_size => hidden_size
: input and inner dimension of the layerinit_kernel
: initializer for the input to hidden weightsinit_recurrent_kernel
: initializer for the hidden to hidden weightsbias
: include a bias or not. Default istrue
Equations
\[\begin{aligned} +f_t &= \sigma(W_f x_t + U_f h_{t-1} + b_f), \\ +\tilde{h}_t &= \tanh(W_h x_t + U_h (f_t \odot h_{t-1}) + b_h), \\ +h_t &= (1 - f_t) \odot h_{t-1} + f_t \odot \tilde{h}_t +\end{aligned}\]
Forward
mgucell(inp, state)
+mgucell(inp)
Arguments
inp
: The input to the mgucell. It should be a vector of sizeinput_size
or a matrix of sizeinput_size x batch_size
.state
: The hidden state of the MGUCell. It should be a vector of sizehidden_size
or a matrix of sizehidden_size x batch_size
. If not provided, it is assumed to be a vector of zeros, initialized byFlux.initialstates
.
Returns
- A tuple
(output, state)
, where both elements are given by the updated statenew_state
, a tensor of sizehidden_size
orhidden_size x batch_size
.
RecurrentLayers.NASCell
— TypeNASCell((input_size => hidden_size);
+ init_kernel = glorot_uniform,
+ init_recurrent_kernel = glorot_uniform,
+ bias = true)
Neural Architecture Search unit. See NAS
for a layer that processes entire sequences.
Arguments
input_size => hidden_size
: input and inner dimension of the layerinit_kernel
: initializer for the input to hidden weightsinit_recurrent_kernel
: initializer for the hidden to hidden weightsbias
: include a bias or not. Default istrue
Equations
\[\begin{aligned} +\text{First Layer Outputs:} & \\ +o_1 &= \sigma(W_i^{(1)} x_t + W_h^{(1)} h_{t-1} + b^{(1)}), \\ +o_2 &= \text{ReLU}(W_i^{(2)} x_t + W_h^{(2)} h_{t-1} + b^{(2)}), \\ +o_3 &= \sigma(W_i^{(3)} x_t + W_h^{(3)} h_{t-1} + b^{(3)}), \\ +o_4 &= \text{ReLU}(W_i^{(4)} x_t \cdot W_h^{(4)} h_{t-1}), \\ +o_5 &= \tanh(W_i^{(5)} x_t + W_h^{(5)} h_{t-1} + b^{(5)}), \\ +o_6 &= \sigma(W_i^{(6)} x_t + W_h^{(6)} h_{t-1} + b^{(6)}), \\ +o_7 &= \tanh(W_i^{(7)} x_t + W_h^{(7)} h_{t-1} + b^{(7)}), \\ +o_8 &= \sigma(W_i^{(8)} x_t + W_h^{(8)} h_{t-1} + b^{(8)}). \\ + +\text{Second Layer Computations:} & \\ +l_1 &= \tanh(o_1 \cdot o_2) \\ +l_2 &= \tanh(o_3 + o_4) \\ +l_3 &= \tanh(o_5 \cdot o_6) \\ +l_4 &= \sigma(o_7 + o_8) \\ + +\text{Inject Cell State:} & \\ +l_1 &= \tanh(l_1 + c_{\text{state}}) \\ + +\text{Final Layer Computations:} & \\ +c_{\text{new}} &= l_1 \cdot l_2 \\ +l_5 &= \tanh(l_3 + l_4) \\ +h_{\text{new}} &= \tanh(c_{\text{new}} \cdot l_5) +\end{aligned}\]
Forward
nascell(inp, (state, cstate))
+nascell(inp)
Arguments
inp
: The input to the fastrnncell. It should be a vector of sizeinput_size
or a matrix of sizeinput_size x batch_size
.(state, cstate)
: A tuple containing the hidden and cell states of the NASCell. They should be vectors of sizehidden_size
or matrices of sizehidden_size x batch_size
. If not provided, they are assumed to be vectors of zeros, initialized byFlux.initialstates
.
Returns
- A tuple
(output, state)
, whereoutput = new_state
is the new hidden state andstate = (new_state, new_cstate)
is the new hidden and cell state. They are tensors of sizehidden_size
orhidden_size x batch_size
.
RecurrentLayers.RHNCell
— TypeRHNCell((input_size => hidden_size), depth=3;
+ couple_carry::Bool = true,
+ cell_kwargs...)
Recurrent highway network. See RHNCellUnit
for a the unit component of this layer. See RHN
for a layer that processes entire sequences.
Arguments
input_size => hidden_size
: input and inner dimension of the layerdepth
: depth of the recurrence. Default is 3couple_carry
: couples the carry gate and the transform gate. Defaulttrue
init_kernel
: initializer for the input to hidden weightsbias
: include a bias or not. Default istrue
Equations
\[\begin{aligned} +s_{\ell}^{[t]} &= h_{\ell}^{[t]} \odot t_{\ell}^{[t]} + s_{\ell-1}^{[t]} \odot c_{\ell}^{[t]}, \\ +\text{where} \\ +h_{\ell}^{[t]} &= \tanh(W_h x^{[t]}\mathbb{I}_{\ell = 1} + U_{h_{\ell}} s_{\ell-1}^{[t]} + b_{h_{\ell}}), \\ +t_{\ell}^{[t]} &= \sigma(W_t x^{[t]}\mathbb{I}_{\ell = 1} + U_{t_{\ell}} s_{\ell-1}^{[t]} + b_{t_{\ell}}), \\ +c_{\ell}^{[t]} &= \sigma(W_c x^{[t]}\mathbb{I}_{\ell = 1} + U_{c_{\ell}} s_{\ell-1}^{[t]} + b_{c_{\ell}}) +\end{aligned}\]
Forward
rnncell(inp, [state])
RecurrentLayers.RHNCellUnit
— TypeRHNCellUnit((input_size => hidden_size)::Pair;
+ init_kernel = glorot_uniform,
+ bias = true)
RecurrentLayers.MUT1Cell
— TypeMUT1Cell((input_size => hidden_size);
+ init_kernel = glorot_uniform,
+ init_recurrent_kernel = glorot_uniform,
+ bias = true)
Mutated unit 1 cell. See MUT1
for a layer that processes entire sequences.
Arguments
input_size => hidden_size
: input and inner dimension of the layerinit_kernel
: initializer for the input to hidden weightsinit_recurrent_kernel
: initializer for the hidden to hidden weightsbias
: include a bias or not. Default istrue
Equations
\[\begin{aligned} +z &= \sigma(W_z x_t + b_z), \\ +r &= \sigma(W_r x_t + U_r h_t + b_r), \\ +h_{t+1} &= \tanh(U_h (r \odot h_t) + \tanh(W_h x_t) + b_h) \odot z \\ +&\quad + h_t \odot (1 - z). +\end{aligned}\]
Forward
mutcell(inp, state)
+mutcell(inp)
Arguments
inp
: The input to the mutcell. It should be a vector of sizeinput_size
or a matrix of sizeinput_size x batch_size
.state
: The hidden state of the MUTCell. It should be a vector of sizehidden_size
or a matrix of sizehidden_size x batch_size
. If not provided, it is assumed to be a vector of zeros, initialized byFlux.initialstates
.
Returns
- A tuple
(output, state)
, where both elements are given by the updated statenew_state
,
a tensor of size hidden_size
or hidden_size x batch_size
.
RecurrentLayers.MUT2Cell
— TypeMUT2Cell((input_size => hidden_size);
+ init_kernel = glorot_uniform,
+ init_recurrent_kernel = glorot_uniform,
+ bias = true)
Mutated unit 2 cell. See MUT2
for a layer that processes entire sequences.
Arguments
input_size => hidden_size
: input and inner dimension of the layerinit_kernel
: initializer for the input to hidden weightsinit_recurrent_kernel
: initializer for the hidden to hidden weightsbias
: include a bias or not. Default istrue
Equations
\[\begin{aligned} +z &= \sigma(W_z x_t + U_z h_t + b_z), \\ +r &= \sigma(x_t + U_r h_t + b_r), \\ +h_{t+1} &= \tanh(U_h (r \odot h_t) + W_h x_t + b_h) \odot z \\ +&\quad + h_t \odot (1 - z). +\end{aligned}\]
Forward
mutcell(inp, state)
+mutcell(inp)
Arguments
inp
: The input to the mutcell. It should be a vector of sizeinput_size
or a matrix of sizeinput_size x batch_size
.state
: The hidden state of the MUTCell. It should be a vector of sizehidden_size
or a matrix of sizehidden_size x batch_size
. If not provided, it is assumed to be a vector of zeros, initialized byFlux.initialstates
.
Returns
- A tuple
(output, state)
, where both elements are given by the updated statenew_state
,
a tensor of size hidden_size
or hidden_size x batch_size
.
RecurrentLayers.MUT3Cell
— TypeMUT3Cell((input_size => hidden_size);
+ init_kernel = glorot_uniform,
+ init_recurrent_kernel = glorot_uniform,
+ bias = true)
Mutated unit 3 cell. See MUT3
for a layer that processes entire sequences.
Arguments
input_size => hidden_size
: input and inner dimension of the layerinit_kernel
: initializer for the input to hidden weightsinit_recurrent_kernel
: initializer for the hidden to hidden weightsbias
: include a bias or not. Default istrue
Equations
\[\begin{aligned} +z &= \sigma(W_z x_t + U_z \tanh(h_t) + b_z), \\ +r &= \sigma(W_r x_t + U_r h_t + b_r), \\ +h_{t+1} &= \tanh(U_h (r \odot h_t) + W_h x_t + b_h) \odot z \\ +&\quad + h_t \odot (1 - z). +\end{aligned}\]
Forward
mutcell(inp, state)
+mutcell(inp)
Arguments
inp
: The input to the mutcell. It should be a vector of sizeinput_size
or a matrix of sizeinput_size x batch_size
.state
: The hidden state of the MUTCell. It should be a vector of sizehidden_size
or a matrix of sizehidden_size x batch_size
. If not provided, it is assumed to be a vector of zeros, initialized byFlux.initialstates
.
Returns
- A tuple
(output, state)
, where both elements are given by the updated statenew_state
,
a tensor of size hidden_size
or hidden_size x batch_size
.
RecurrentLayers.SCRNCell
— TypeSCRNCell((input_size => hidden_size);
+ init_kernel = glorot_uniform,
+ init_recurrent_kernel = glorot_uniform,
+ bias = true,
+ alpha = 0.0)
Structurally contraint recurrent unit. See SCRN
for a layer that processes entire sequences.
Arguments
input_size => hidden_size
: input and inner dimension of the layerinit_kernel
: initializer for the input to hidden weightsinit_recurrent_kernel
: initializer for the hidden to hidden weightsbias
: include a bias or not. Default istrue
alpha
: structural contraint. Default is 0.0
Equations
\[\begin{aligned} +s_t &= (1 - \alpha) W_s x_t + \alpha s_{t-1}, \\ +h_t &= \sigma(W_h s_t + U_h h_{t-1} + b_h), \\ +y_t &= f(U_y h_t + W_y s_t) +\end{aligned}\]
Forward
scrncell(inp, (state, cstate))
+scrncell(inp)
Arguments
inp
: The input to the scrncell. It should be a vector of sizeinput_size
or a matrix of sizeinput_size x batch_size
.(state, cstate)
: A tuple containing the hidden and cell states of the SCRNCell. They should be vectors of sizehidden_size
or matrices of sizehidden_size x batch_size
. If not provided, they are assumed to be vectors of zeros, initialized byFlux.initialstates
.
Returns
- A tuple
(output, state)
, whereoutput = new_state
is the new hidden state andstate = (new_state, new_cstate)
is the new hidden and cell state. They are tensors of sizehidden_size
orhidden_size x batch_size
.
RecurrentLayers.PeepholeLSTMCell
— TypePeepholeLSTMCell((input_size => hidden_size);
+ init_kernel = glorot_uniform,
+ init_recurrent_kernel = glorot_uniform,
+ bias = true)
Peephole long short term memory cell. See PeepholeLSTM
for a layer that processes entire sequences.
Arguments
input_size => hidden_size
: input and inner dimension of the layerinit_kernel
: initializer for the input to hidden weightsinit_recurrent_kernel
: initializer for the hidden to hidden weightsbias
: include a bias or not. Default istrue
Equations
\[\begin{aligned} +f_t &= \sigma_g(W_f x_t + U_f c_{t-1} + b_f), \\ +i_t &= \sigma_g(W_i x_t + U_i c_{t-1} + b_i), \\ +o_t &= \sigma_g(W_o x_t + U_o c_{t-1} + b_o), \\ +c_t &= f_t \odot c_{t-1} + i_t \odot \sigma_c(W_c x_t + b_c), \\ +h_t &= o_t \odot \sigma_h(c_t). +\end{aligned}\]
Forward
peepholelstmcell(inp, (state, cstate))
+peepholelstmcell(inp)
Arguments
inp
: The input to the peepholelstmcell. It should be a vector of sizeinput_size
or a matrix of sizeinput_size x batch_size
.(state, cstate)
: A tuple containing the hidden and cell states of the PeepholeLSTMCell. They should be vectors of sizehidden_size
or matrices of sizehidden_size x batch_size
. If not provided, they are assumed to be vectors of zeros, initialized byFlux.initialstates
.
Returns
- A tuple
(output, state)
, whereoutput = new_state
is the new hidden state andstate = (new_state, new_cstate)
is the new hidden and cell state. They are tensors of sizehidden_size
orhidden_size x batch_size
.
RecurrentLayers.FastRNNCell
— TypeFastRNNCell((input_size => hidden_size), [activation];
+ init_kernel = glorot_uniform,
+ init_recurrent_kernel = glorot_uniform,
+ bias = true)
Fast recurrent neural network cell. See FastRNN
for a layer that processes entire sequences.
Arguments
input_size => hidden_size
: input and inner dimension of the layeractivation
: the activation function, defaults totanh_fast
init_kernel
: initializer for the input to hidden weightsinit_recurrent_kernel
: initializer for the hidden to hidden weightsbias
: include a bias or not. Default istrue
Equations
\[\begin{aligned} +\tilde{h}_t &= \sigma(W_h x_t + U_h h_{t-1} + b), \\ +h_t &= \alpha \tilde{h}_t + \beta h_{t-1} +\end{aligned}\]
Forward
fastrnncell(inp, state)
+fastrnncell(inp)
Arguments
inp
: The input to the fastrnncell. It should be a vector of sizeinput_size
or a matrix of sizeinput_size x batch_size
.state
: The hidden state of the FastRNN. It should be a vector of sizehidden_size
or a matrix of sizehidden_size x batch_size
. If not provided, it is assumed to be a vector of zeros, initialized byFlux.initialstates
.
Returns
- A tuple
(output, state)
, where both elements are given by the updated statenew_state
, a tensor of sizehidden_size
orhidden_size x batch_size
.
RecurrentLayers.FastGRNNCell
— TypeFastGRNNCell((input_size => hidden_size), [activation];
+ init_kernel = glorot_uniform,
+ init_recurrent_kernel = glorot_uniform,
+ bias = true)
Fast gated recurrent neural network cell. See FastGRNN
for a layer that processes entire sequences.
Arguments
input_size => hidden_size
: input and inner dimension of the layeractivation
: the activation function, defaults totanh_fast
init_kernel
: initializer for the input to hidden weightsinit_recurrent_kernel
: initializer for the hidden to hidden weightsbias
: include a bias or not. Default istrue
Equations
\[\begin{aligned} +z_t &= \sigma(W_z x_t + U_z h_{t-1} + b_z), \\ +\tilde{h}_t &= \tanh(W_h x_t + U_h h_{t-1} + b_h), \\ +h_t &= \big((\zeta (1 - z_t) + \nu) \odot \tilde{h}_t\big) + z_t \odot h_{t-1} +\end{aligned}\]
Forward
fastgrnncell(inp, state)
+fastgrnncell(inp)
Arguments
inp
: The input to the fastgrnncell. It should be a vector of sizeinput_size
or a matrix of sizeinput_size x batch_size
.state
: The hidden state of the FastGRNN. It should be a vector of sizehidden_size
or a matrix of sizehidden_size x batch_size
. If not provided, it is assumed to be a vector of zeros, initialized byFlux.initialstates
.
Returns
- A tuple
(output, state)
, where both elements are given by the updated statenew_state
, a tensor of sizehidden_size
orhidden_size x batch_size
.