L16: Image Segmentation

HHZZ included in UMich-EECS-498

2025-04-22 505 words 3 minutes

Contents

Image Segmentation

Goal: to learn about what Fully Convolutional Networks (FCN) are and how they can be used for image segmentation. 🤔

Semantic Segmentation

a simple way is to slide the window, creating a slow RNN?
another way is to build a Equal size Fully Convolutional Network (but receptive field is limited and hard to generalize to higher resolutions)
U-Net like architecture 😉

but how to UPSAMPLE the feature maps?

Up-sampling

simple built-in function, targets at the average pooling
- bed of nails
- Nearest Neighbor
- Bilinear Interpolation
  - Bicubic Interpolation also like this onw
Max Unpooling, targets at the max pooling
- when max pooling is used, remember the position of the maximum value, and use it (those places) to upsample the feature map
Transpose Convolution, learnable weights

Ready for math? 😉

when 1D, size=3, stride=1, padding=1… $$ \vec{x}*\vec{a} = X \vec{a} $$

$$ \begin{bmatrix} x & y & z & 0 & 0 & 0 \\ 0 & x & y & z & 0 & 0 \\ 0 & 0 & x & y & z & 0 \\ 0 & 0 & 0 & x & y & z \\ \end{bmatrix} \begin{bmatrix} 0 \\ a \\ b \\ c \\ d \\ 0 \\ \end{bmatrix} = \begin{bmatrix} ay+bz \\ ax+by+cz \\ bx+cy+dz \\ cx+dy \\ \end{bmatrix} $$

$$ let \ *^T\ be \ transpose \ convolution \Rightarrow \vec{x}*^T \vec{a} = X^T \vec{a} $$

$$ \begin{bmatrix} x & 0 & 0 & 0 \\ y & x & 0 & 0 \\ z & y & x & 0 \\ 0 & z & y & x \\ 0 & 0 & z & y \\ 0 & 0 & 0 & z \\ \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ \end{bmatrix} = \begin{bmatrix} ax \\ ay+bx\\ az+by+cx \\ bz+cy+dx \\ cz+dy \\ dz \\ \end{bmatrix} $$

HOWEVER, when the stride is greater than 1, transposed conv can not be expressed as normal conv

when 1D, size=3, stride=2, padding=1… $$ \vec{x}*\vec{a} = X \vec{a} $$

$$ \begin{bmatrix} x & y & z & 0 & 0 & 0 \\ 0 & 0 & x & y & z & 0 \\ \end{bmatrix} \begin{bmatrix} 0 \\ a \\ b \\ c \\ d \\ 0 \\ \end{bmatrix} = \begin{bmatrix} ay+bz \\ bx+cy+dz \\ \end{bmatrix} $$

$$ let \ *^T\ be \ transpose \ convolution \Rightarrow \vec{x}*^T \vec{a} = X^T \vec{a} $$

$$ \begin{bmatrix} x & 0 \\ y & 0 \\ z & x \\ 0 & y \\ 0 & z \\ 0 & 0 \\ \end{bmatrix} \begin{bmatrix} a \\ b \\ \end{bmatrix} = \begin{bmatrix} ax \\ ay \\ az+bx \\ by \\ bz \\ 0 \\ \end{bmatrix} $$

HHZZ: there is another way, which is upsampling the feature maps by a buit-in function, and then use a convolutional layer to learn the weights for the upsampling.

Instance Segmentation

object detection + segmentation 😉

see Mask R-CNN paper

else

Panoptic Segmentation
end to end Instance Segmentation
key points segmentation & human pose estimation