Video in NN Introduction Video: 4D tensor, T x H x W x C, a sequence of frames(images).
Video Classification Input: T x H x W x C Output: K classes, actions instead of nouns. Problems: raw video are BIG, ~1.5GB per minute for SD(640x480x3)
Solutions: on short clips, eg. T=16, H=W=112, low fps
Short Term model Single-Frame CNN so you can use a 2D CNN on each frame independently :)
Generative Model I supervised learning vs unsupervised learning supervised learning: labeled data, labeled target variable
classification, semantic segmentation, object detection, etc. unsupervised learning: unlabeled data, no target variable
clustering, density estimation, feature extraction/learning, dimensionality reduction, etc. we are trying to learn model the distribution of the data
Discriminative, Generative, and Conditional Generative all three types of learning can be used in a equation, x is the input variable, y is the target variable,
Attention Mechanisms in Neural Networks introduction What if Seq to Seq models processed long long sequences?
Attention Mechanisms the core idea is that using weighted sum, and the coefficient can be learned from the model itself
In math, we do not actually care that wether input is a sequence or not.
given hidden states $h_i$ and the context vector $c$, we can calculate the attention weights as follows:
$$ e_{t, i, j} = f_{att}(s_{t-1}, h_{i,j}) \ a_{t, :, :} = softmax(e_{t, :, :}) \ c_{t} = \sum_{i,j} a_{t, i, j} h_{i,j} $$