OpenCV
3.2.0
Open Source Computer Vision
|
LSTM recurrent layer. More...
#include "all_layers.hpp"
Public Member Functions | |
void | forward (std::vector< Blob *> &input, std::vector< Blob > &output) |
virtual Blob | getC () const =0 |
Returns current \( c_{t-1} \) value (deep copy). More... | |
virtual Blob | getH () const =0 |
Returns current \( h_{t-1} \) value (deep copy). More... | |
int | inputNameToIndex (String inputName) |
Returns index of input blob into the input array. More... | |
int | outputNameToIndex (String outputName) |
Returns index of output blob in output array. More... | |
virtual void | setC (const Blob &C)=0 |
Set \( c_{t-1} \) value that will be used in next forward() calls. More... | |
virtual void | setH (const Blob &H)=0 |
Set \( h_{t-1} \) value that will be used in next forward() calls. More... | |
virtual void | setOutShape (const BlobShape &outTailShape=BlobShape::empty())=0 |
Specifies shape of output blob which will be [[T ], N ] + outTailShape . More... | |
virtual void | setProduceCellOutput (bool produce=false)=0 |
If this flag is set to true then layer will produce \( c_t \) as second output. More... | |
virtual void | setUseTimstampsDim (bool use=true)=0 |
Specifies either interpet first dimension of input blob as timestamp dimenion either as sample. More... | |
virtual void | setWeights (const Blob &Wh, const Blob &Wx, const Blob &b)=0 |
Public Member Functions inherited from cv::dnn::Layer | |
Layer () | |
Layer (const LayerParams ¶ms) | |
Initializes only name, type and blobs fields. More... | |
virtual | ~Layer () |
virtual void | allocate (const std::vector< Blob *> &input, std::vector< Blob > &output)=0 |
Allocates internal buffers and output blobs with respect to the shape of inputs. More... | |
void | allocate (const std::vector< Blob > &inputs, std::vector< Blob > &outputs) |
std::vector< Blob > | allocate (const std::vector< Blob > &inputs) |
void | forward (const std::vector< Blob > &inputs, std::vector< Blob > &outputs) |
void | run (const std::vector< Blob > &inputs, std::vector< Blob > &outputs) |
Allocates layer and computes output. More... | |
void | setParamsFrom (const LayerParams ¶ms) |
Initializes only name, type and blobs fields. More... | |
Static Public Member Functions | |
static Ptr< LSTMLayer > | create () |
Additional Inherited Members | |
Public Attributes inherited from cv::dnn::Layer | |
std::vector< Blob > | blobs |
List of learned parameters must be stored here to allow read them by using Net::getParam(). More... | |
String | name |
Name of the layer instance, can be used for logging or other internal purposes. More... | |
String | type |
Type name which was used for creating layer by layer factory. More... | |
LSTM recurrent layer.
|
virtual |
In common case it use single input with \(x_t\) values to compute output(s) \(h_t\) (and \(c_t\)).
input | should contain packed values \(x_t\) |
output | contains computed outputs: \(h_t\) (and \(c_t\) if setProduceCellOutput() flag was set to true). |
If setUseTimstampsDim() is set to true then input
[0] should has at least two dimensions with the following shape: [T
, N
, [data dims]
], where T
specifies number of timpestamps, N
is number of independent streams (i.e. \( x_{t_0 + t}^{stream} \) is stored inside input
[0][t, stream, ...]).
If setUseTimstampsDim() is set to fase then input
[0] should contain single timestamp, its shape should has form [N
, [data dims]
] with at least one dimension. (i.e. \( x_{t}^{stream} \) is stored inside input
[0][stream, ...]).
Implements cv::dnn::Layer.
|
pure virtual |
Returns current \( c_{t-1} \) value (deep copy).
|
pure virtual |
Returns current \( h_{t-1} \) value (deep copy).
|
virtual |
Returns index of input blob into the input array.
inputName | label of input blob |
Each layer input and output can be labeled to easily identify them using "%<layer_name%>[.output_name]" notation. This method maps label of input blob to its index into input vector.
Reimplemented from cv::dnn::Layer.
|
virtual |
Returns index of output blob in output array.
Reimplemented from cv::dnn::Layer.
|
pure virtual |
|
pure virtual |
|
pure virtual |
Specifies shape of output blob which will be [[T
], N
] + outTailShape
.
If this parameter is empty or unset then outTailShape
= [Wh
.size(0)] will be used, where Wh
is parameter from setWeights().
|
pure virtual |
If this flag is set to true then layer will produce \( c_t \) as second output.
Shape of the second output is the same as first output.
|
pure virtual |
Specifies either interpet first dimension of input blob as timestamp dimenion either as sample.
If flag is set to true then shape of input blob will be interpeted as [T
, N
, [data dims]
] where T
specifies number of timpestamps, N
is number of independent streams. In this case each forward() call will iterate through T
timestamps and update layer's state T
times.
If flag is set to false then shape of input blob will be interpeted as [N
, [data dims]
]. In this case each forward() call will make one iteration and produce one timestamp with shape [N
, [out dims]
].
|
pure virtual |
Set trained weights for LSTM layer. LSTM behavior on each step is defined by current input, previous output, previous cell state and learned weights.
Let \(x_t\) be current input, \(h_t\) be current output, \(c_t\) be current state. Than current output and current cell state is computed as follows:
\begin{eqnarray*} h_t &= o_t \odot tanh(c_t), \\ c_t &= f_t \odot c_{t-1} + i_t \odot g_t, \\ \end{eqnarray*}
where \(\odot\) is per-element multiply operation and \(i_t, f_t, o_t, g_t\) is internal gates that are computed using learned wights.
Gates are computed as follows:
\begin{eqnarray*} i_t &= sigmoid&(W_{xi} x_t + W_{hi} h_{t-1} + b_i), \\ f_t &= sigmoid&(W_{xf} x_t + W_{hf} h_{t-1} + b_f), \\ o_t &= sigmoid&(W_{xo} x_t + W_{ho} h_{t-1} + b_o), \\ g_t &= tanh &(W_{xg} x_t + W_{hg} h_{t-1} + b_g), \\ \end{eqnarray*}
where \(W_{x?}\), \(W_{h?}\) and \(b_{?}\) are learned weights represented as matrices: \(W_{x?} \in R^{N_h \times N_x}\), \(W_{h?} \in R^{N_h \times N_h}\), \(b_? \in R^{N_h}\).
For simplicity and performance purposes we use \( W_x = [W_{xi}; W_{xf}; W_{xo}, W_{xg}] \) (i.e. \(W_x\) is vertical contacentaion of \( W_{x?} \)), \( W_x \in R^{4N_h \times N_x} \). The same for \( W_h = [W_{hi}; W_{hf}; W_{ho}, W_{hg}], W_h \in R^{4N_h \times N_h} \) and for \( b = [b_i; b_f, b_o, b_g]\), \(b \in R^{4N_h} \).
Wh | is matrix defining how previous output is transformed to internal gates (i.e. according to abovemtioned notation is \( W_h \)) |
Wx | is matrix defining how current input is transformed to internal gates (i.e. according to abovemtioned notation is \( W_x \)) |
b | is bias vector (i.e. according to abovemtioned notation is \( b \)) |