/avatar.jpg

L18-Video

Video in NN Introduction Video: 4D tensor, T x H x W x C, a sequence of frames(images). Video Classification Input: T x H x W x C Output: K classes, actions instead of nouns. Problems: raw video are BIG, ~1.5GB per minute for SD(640x480x3) Solutions: on short clips, eg. T=16, H=W=112, low fps Short Term model Single-Frame CNN so you can use a 2D CNN on each frame independently :)

L19-Generative Model I

Generative Model I supervised learning vs unsupervised learning supervised learning: labeled data, labeled target variable classification, semantic segmentation, object detection, etc. unsupervised learning: unlabeled data, no target variable clustering, density estimation, feature extraction/learning, dimensionality reduction, etc. we are trying to learn model the distribution of the data Discriminative, Generative, and Conditional Generative all three types of learning can be used in a equation, x is the input variable, y is the target variable,

L13-Attention

Attention Mechanisms in Neural Networks introduction What if Seq to Seq models processed long long sequences? Attention Mechanisms the core idea is that using weighted sum, and the coefficient can be learned from the model itself In math, we do not actually care that wether input is a sequence or not. given hidden states $h_i$ and the context vector $c$, we can calculate the attention weights as follows: $$ e_{t, i, j} = f_{att}(s_{t-1}, h_{i,j}) \ a_{t, :, :} = softmax(e_{t, :, :}) \ c_{t} = \sum_{i,j} a_{t, i, j} h_{i,j} $$