Contents

L20-Generative Model II

Generative Model II

for some reason, it seems like that we are backed ๐Ÿ˜†

setup: assume that we have data xix_i from a distribution pdata(x)p_{data}(x), all we wanna do is to sample from pdata(x)p_{data}(x)

idea: introduce a latent variable zz with simple prior p(z)p(z), those zz can be interpolated,

sample zz from p(z)p(z), and pass into a Generator x=G(z)x = G(z), then we said that xx is a sample from the Generator distribution pGp_{G},

all we need to do is let pG=pdatap_G = p_{data} !

and then we include a Discriminator D(x)D(x), which takes in xx and outputs a probability that xx is real (from pdatap_{data}) or fake (from pGp_{G}).

Hopefully, this will converge to a point where D(x)D(x) is not able to correctly classify real and fake data.

minmax game:

LGAN=minโกGmaxโกDExโˆผpdata[logD(x)]+Ezโˆผp(z)[log(1โˆ’D(G(z)))]=minโกGmaxโกDV(G,D) \begin{aligned} \mathcal{L}_{GAN} &= \min_{G} \max_{D} E_{x \sim p_{data}}[log D(x)] + E_{z \sim p(z)}[log (1-D(G(z)))] \\ &= \min_{G} \max_{D} V(G,D) \\ \end{aligned}

alternating gradient update:

For t in 1, โ€ฆ, T:

  • update DD, D=D+ฮฑDโˆ‚Vโˆ‚DD = D + \alpha_D \frac{\partial V}{\partial D}
  • update GG, G=Gโˆ’ฮฑGโˆ‚Vโˆ‚GG = G - \alpha_G \frac{\partial V}{\partial G}

In practice, we wanna the GG to minimize โˆ’logD(G(z))-log D(G(z)), in order to avoid the vanishing gradient problem at the begging.

minโกGmaxโกDExโˆผpdata[logD(x)]+Ezโˆผp(z)[log(1โˆ’D(G(z)))]=minโกG(2ร—JSD(pdata,pG)โˆ’log4) \min_{G} \max_{D} E_{x \sim p_{data}}[log D(x)] + E_{z \sim p(z)}[log (1-D(G(z)))] \\ = \min_G(2 \times JSD(p_{data}, p_G) - log4)

Proof: (ready for the math? ๐Ÿ˜‰)

LGAN=minโกGmaxโกDExโˆผpdata[logD(x)]+Ezโˆผp(z)[log(1โˆ’D(G(z)))]=minโกGmaxโกDExโˆผpdata[logD(x)]+ExโˆผpG[log(1โˆ’D(x))]=minโกGmaxโกDโˆซX(pdata(x)logD(x)+pG(x)log(1โˆ’D(x)))dx=minโกGโˆซXmaxโกD(pdata(x)logD(x)+pG(x)log(1โˆ’D(x)))dx \begin{aligned} \mathcal{L}_{GAN} &= \min_G \max_D E_{x \sim p_{data}}[log D(x)] + E_{z \sim p(z)}[log (1-D(G(z)))] \\ &= \min_G \max_D E_{x \sim p_{data}}[log D(x)] + E_{x \sim p_G}[log (1-D(x))] \\ &= \min_G \max_D \int_X(p_{data}(x)logD(x)+p_G(x)log(1-D(x)))dx \\ &= \min_G \int_X \max_D (p_{data}(x)logD(x)+p_G(x)log(1-D(x)))dx \\ \end{aligned}

f(y)=alogy+blog(1โˆ’y),letfโ€™(y)=0โ‡’y=aa+bโ‡’DGโˆ—(x)=pdata(x)pdata(x)+pG(x) f(y) = alogy+blog(1-y), let fโ€™(y) = 0 \Rightarrow y = \frac{a}{a+b} \\ \Rightarrow D_G^{*}(x) = \frac{p_{data}(x)}{p_{data}(x)+p_G(x)} \\

=minโกGโˆซX(pdata(x)logDGโˆ—(x)+pG(x)log(1โˆ’DGโˆ—(x)))dx=minโกGโˆซX(pdata(x)logpdata(x)pdata(x)+pG(x)+pG(x)logpG(x)pdata(x)+pG(x))dx=minโกG(Exโˆผpdata[logpdata(x)pdata(x)+pG(x))]+ExโˆผpG[logpG(x)pdata(x)+pG(x))])=minโกG(Exโˆผpdata[log2ร—pdata(x)pdata(x)+pG(x))]+ExโˆผpG[log2ร—pG(x)pdata(x)+pG(x))]โˆ’log4)=minโกG(KL(pdata,pdata+pG2)+KL(pG,pdata+pG2)โˆ’log4)=minโกG(2ร—JSD(pdata,pG)โˆ’log4)โ‡’minโกG(JSD(pdata,pG)) \begin{aligned} \small &= \min_G \int_X (p_{data}(x)logD_{G}^{*}(x)+p_G(x)log(1-D_{G}^{*}(x)))dx \\ &= \min_G \int_X (p_{data}(x)log\frac{p_{data}(x)}{p_{data}(x)+p_G(x)}+p_G(x) log\frac{p_G(x)}{p_{data}(x)+p_G(x)})dx \\ &= \min_G (E_{x \sim p_{data}}[log\frac{p_{data}(x)}{p_{data}(x)+p_G(x))}] + E_{x \sim p_G}[log\frac{p_G(x)}{p_{data}(x)+p_G(x))}]) \\ &= \min_G (E_{x \sim p_{data}}[log\frac{2 \times p_{data}(x)}{p_{data}(x)+p_G(x))}] + E_{x \sim p_G}[log\frac{2 \times p_G(x)}{p_{data}(x)+p_G(x))}] - log4) \\ &= \min_G (KL(p_{data}, \frac{p_{data}+p_G}{2})+KL(p_G, \frac{p_{data}+p_G}{2}) - log4) \\ &= \min_G (2 \times JSD(p_{data}, p_G) - log4) \Rightarrow \min_G (JSD(p_{data}, p_G)) \end{aligned}

/l20-generative-model-ii/image/index/1745217375747.png

input xx and condition yy both GG, DD

Conditional BatchNormalization

some tasks

  • GAN of Video
  • Text to image synthesis
  • Image to image translation
  • Image to image super-resolution
  • Label Map to Image synthesis / style transfer

even trajectory prediction!