L18-Video

HHZZ published on 2025-04-20 included in UMich-EECS-498

Video in NN Introduction Video: 4D tensor, T x H x W x C, a sequence of frames(images). Video Classification Input: T x H x W x C Output: K classes, actions instead of nouns. Problems: raw video are BIG, ~1.5GB per minute for SD(640x480x3) Solutions: on short clips, eg. T=16, H=W=112, low fps Short Term model Single-Frame CNN so you can use a 2D CNN on each frame independently :)

L19-Generative Model I

HHZZ published on 2025-04-20 included in UMich-EECS-498

Generative Model I supervised learning vs unsupervised learning supervised learning: labeled data, labeled target variable classification, semantic segmentation, object detection, etc. unsupervised learning: unlabeled data, no target variable clustering, density estimation, feature extraction/learning, dimensionality reduction, etc. we are trying to learn model the distribution of the data Discriminative, Generative, and Conditional Generative all three types of learning can be used in a equation, x is the input variable, y is the target variable,

L13-Attention

HHZZ published on 2025-04-19 included in UMich-EECS-498

Attention Mechanisms in Neural Networks introduction What if Seq to Seq models processed long long sequences? Attention Mechanisms the core idea is that using weighted sum, and the coefficient can be learned from the model itself In math, we do not actually care that wether input is a sequence or not. given hidden states $h_i$ and the context vector $c$, we can calculate the attention weights as follows: $$ e_{t, i, j} = f_{att}(s_{t-1}, h_{i,j}) \ a_{t, :, :} = softmax(e_{t, :, :}) \ c_{t} = \sum_{i,j} a_{t, i, j} h_{i,j} $$