Comparative Analysis of Generative Adversarial Networks (GAN) and Diffusion Probabilistic Models (DPM)

GAN vs DPM Presenter : 盧德晏 Advisor : 丁建均 Date : 04/25, 2023 教授

Outline • Generative Adversary Network (GAN) • Ian Pouget-Abadie, Jean Mirza, Mehdi Xu, Bing Warde-Farley, David Ozair , Sherjil Courville, Aaron Bengio, and Yoshua. ”Generative adversarial nets,” proceedings of NIPS, 2014. pp. 2672–2680. • Yang Wang, “A mathematical introduction to generative adversarial nets,” arXiv: 2009.00169 (2020) • Diffusion Probabilistic Models (DPM) • Jonathan Ho, Ajay Jain, and Pieter Abbeel, “Denoising diffusion probabilistic models,” arXiv Preprint arxiv:2006.11239 (2020) • Alex Nichol & Prafulla Dhariwal. “Improved denoising diffusion probabilistic models,” arXiv Preprint arxiv:2102.09672 (2021) • Prafula Dhariwal & Alex Nichol. “Diffusion Models Beat GANs on Image Synthesis., arXiv Preprint arxiv:2105.05233 (2021) • Rombach & Blattmann, et al. “High-Resolution Image Synthesis with Latent Diffusion Models,“ CVPR, 2022.

Generative Adversary Network (GAN)

Generative Adversary Network (GAN) • Generative Adversary Network (GAN):

Generative Adversary Network • Notation: • ? ∈ ℝ?: Real data • ? ∈ ℝ?: Latent vector • ?(?) ∈ ℝ?: Faked data • ? ? ∈ ℝ: Discriminator evaluation of real data • ? ? ? ∈ ℝ: Discriminator evaluation of faked data • ????? ?,? ∈ ℝ: Error between ? and ? • 1 : True ; 0 : False • Discriminator: • Loss function : ??= ????? ? ? ,1 + ????? ? ? ? ,0 • Generator: • Loss function : ??= ????? ? ? ? ,1

Generative Adversary Network • Cross Entropy: • ? ? ,? ? are probability distribution • ? ?,? = ??~? ? −log ? ? • Binary Cross Entropy: • ?, ? ∈ 0,1 • Label: ? ; Estimation: ? • ? ?, ? = − ???? ? + 1 − ? log 1 − ? • Loss function of discriminator: • ??= ? 1,? ? + ? 0,? ? ? = − ?∈?? ? log ? ? • ??= − ?∈?,?∈???? ? ? • Loss function of generator: • ??= ? 1,? ? ? + log 1 − ? ? ? • ??= − ?∈?,?∈???? ? ? ?

Generative Adversary Network • Another representations for optimization: • Discriminator: • min ? ? • max ? − ?∈?,?∈???? ? ? ??= min + log 1 − ? ? ? ??? ? ? + log 1 − ? ? ? • Combine generator with discriminator: • min ? ? max ??? ? ? + log 1 − ? ? ? • Define a value function: • V ?,? = ??~?????log ? ? • min ? ? + ??~??log 1 − ? ? ? max V ?,?

Generative Adversary Network • Training discriminator: • max ? • V ?,? = ??~?????log ? ? = ??~?????log ? ? = ?????? log ? ? By partial derivative : V ?,? (1) + ??~??log 1 − ? ? ? + ??~??log 1 − ? ? + ??? log 1 − ? ? (2) ?? (3) ??? 1−? ?= 0 ?V ?,? ?? =?????? ? ? ?????? ?????? +???  ?∗? = − (4) ?????? +???= 1, ? ∈ ???? ???? ?????? ?∗? = (5) 0, ? ∈ ????? ???? ? ?

Generative Adversary Network • Log-sum inequality : • Kullback-Leibler divergence ????||? • Definition : ????||? = ?∈?? ? log • Non-negativity : ????||? ≥ 0 • Convexity :?????1+ 1 − ? ?2||??1+ 1 − ? ?2 ≤ ?????1||?1 + 1 − ? ????2||?2, 0 ≤ ? ≤ 1 • Non-symmetry : ????||? ≠ ????||? ? ?=1 ?=1 ?? ?? ?? ?? ? ? ?=1 ≥ ?=1 ??log ?? log ? ? ? ? ? • Jensen’s inequality : If ? is a concave function, then ? ? ? ≥ ? ? ? • Jensen–Shannon divergence ????||? • Definition : ????||? =1 • Bound : 0 ≤ ????||? ≤ ????2 , where ? is the base of logarithm • Convex function • Symmetry : ????||? = ????||? 2????||? +1 2????||? , where ? =?+? 2

Generative Adversary Network • Training generator: • min ? • V ?,?∗= ??~?????log ?∗? V ?,?∗ (1) + ??~??log 1 − ?∗? ??? ?????? ?????? +??? = ??~?????log + ??~??log (2) ?????? +??? ?????? +??? 2 = −???4 + ??~?????log ?????? − log (3) ?????? +??? 2 ?????? +??? 2 +??~?????log ??? − log ?????? +??? 2 = −???4 + ??? ?????? || + ??? ??? || (4) = −???4 + 2??? ?????? ||??? ≥ −???4 (5) • Bound : 0 ≤ ????||? ≤ ????2 , where ? is the base of logarithm

Generative Adversary Network • Some limitations for GAN : • Vanishing gradient issue due to which the generator training might fail • Model collapse where the generator might repeatedly create the same output • Failure to converge due to which the discriminator feedback can get less meaningful to the generator thus impacting its quality ? ? ? ?  Why do the gradient of GAN vanish during the training ?  Jensen–Shannon divergence ????||? Definition : ????||? =1 2????||? +1 2????||? , where ? =?+? 2  ????||? =1  When ? ≥ 5, ? ? ≈ 0 : ????||? = ???2 When ? < 5, q ? ≈ 0 : ????||? = ???2  ∀? ∈ ℝ,????||? = ???2 = ?????.  ?????||? = 0 ? ? +1 ? ? 2 ? ? log 2 ? ? log + ???2 ? ? +? ? ? ? +? ? Vanishing gradient

Diffusion Probabilistic Models (DPM)

Diffusion Probabilistic Models • Diffusion Probabilistic Models (DPM): Noise Noise Noise Noise ? ?0 Forward diffusion ⋯⋯ ?0 ?1 ?2 ? ⋯⋯ Reverse diffusion ? ?0 denoise denoise denoise denoise

Diffusion Probabilistic Models • Forward diffusion : add noise to the image Distribution of the noised images Mean ?? Variance Σ? Output ? ??|??−1 = ? ??; 1 − ????−1, ???  Notations:  t : time step (from 0 to T)  ?0: a data sampled from the real data distribution ? ? (i.e. ?0~? ? )  ??: variance schedule (0 ≤ ??≤ 1, and ?0= ????? ??????, ??= ????? ??????)  ? : Identity matrix

Diffusion Probabilistic Models • Closed-Form Formula : • ??= ???0+ • ?~? ?,? • ??= 1 − ?? • ??= ?=1 1 − ??? ? ?? • Reparameterization trick : • If ? = ?1,?2,…,?? • If ?~? ?,?2? , then ? = ? + ?? , where ?~? ?,? • Mean vector : ??= ? ? = ? ? + ?? = ? + ?? ? = ? • Variance matrix: ??? ? = ??? ? + ?? = ?2? ?and i.i.d., then ??? ??,?? = 0 , ∀? ≠ ?

Diffusion Probabilistic Models • ? ??|??−1 = ? ??; 1 − ????−1, ??? (1) (2) (3) ??= 1 − ????−1+ ????−1+ ?? ??−1??−2+ ????−1??−2+ ????−1??−2+ = ⋯ = ????−1…?1?0+ = ???0+ ????−1 = = = = 1 − ????−1 1 − ??−1??−2 + ??1 − ??−1??−2+ 1 − ????−1??−2 1 − ????−1 1 − ????−1 (4) (5) (6) 1 − ????−1…?1? (7) 1 − ??? • • • • • • All the ε are i.i.d. (independent and identically distributed) standard normal random variables ?0,?1,…,??−1~? ?,? ?0, ?1,…,??−1~? ?,? ?~? ?,? ??= 1 − ?? ??= ?=1 ?? ?

Diffusion Probabilistic Models ??= ???0+ 1 − ???

Diffusion Probabilistic Models • Reverse diffusion : remove noise from the image ??−1; ????,?0, ??? ??−1; ????,? , ????,? • Target distribution : ? ??−1|?? = ? • Approximated distribution : ????−1|?? = ? • ? : Learnable parameters by neural network

Diffusion Probabilistic Models • Loss function : Negative Log-Likelihood • ? = −log ???0 • ???0 depends on ?1, ?2, …, ??; therefore, it is intractable • Instead of optimizing the intractable loss function itself, we can optimize the Variational Lower Bound. • Non−negativity : ????||? ≥ 0

Diffusion Probabilistic Models • Prove for the VLB of the loss function : • Loss function : Negative Log-Likelihood • ? = −log ???0 • ???0 depends on ?1, ?2, …, ??; therefore, it is intractable • Instead of optimizing the intractable loss function itself, we can optimize the Variational Lower Bound (VLB). • Non−negativity : ????||? ≥ 0 ? =

Diffusion Probabilistic Models • Instead of optimizing the intractable loss function itself, we can optimize the Variational Lower Bound (VLB). Jensen’s inequality

Diffusion Probabilistic Models • Loss function: • ? = ?????? ??|?0||???? = ????+ ?=2 • Constant term ∶ ??= ???? ??|?0||???? • Since ? has no learnable parameters and ???? is just a Gaussian noise probability, this term will be a constant during training and thus can be ignored. • Stepwise denoising term ∶ ??−1= ???? ??−1|??,?0||????−1|?? • This term compares the target denoising step ? and the approximated denoising step ?? • Reconstruction term : ?0= −log ???0|?1 • This is the reconstruction loss of the last denoising step and it can be ignored during training for the following reasons • It can be approximated using the same neural network in ??−1. • Ignoring it makes the sample quality better and makes it simpler to implement. ? + ?=2 ???? ??−1|??,?0||????−1|?? − log ???0|?1 ? ??−1+ ?0 ,∀? = 2,3,…,?

Diffusion Probabilistic Models • Stepwise denoising term ∶ ??−1= ???? ??−1|??,?0||????−1|?? • ?????? ????????????: ? ??−1|??,?0 = ? , ∀? = 2,3,…,? ??−1; ????,?0, ??? Use Bayes’ rule, we have • Bayes’ rule : ? ?|? ? ? = ? ? ? ? ? Variance : ?2? +?2 2?2? − ?2= exp −1 ??= 1 − ?? ??= ?=1 ?? ??= ???0+ 1 ?2?2−2? 1 • exp − ?2 2 Mean : • • • ? 1 1 − ???? ?0= ??− 1 − ???? ??

Diffusion Probabilistic Models 1 • Apply ??= 1 − ???? ?0= ???0+ ??− 1 − ???? into the mean ?? Variance : Mean : 1 ?? 1−?? 1−???? We can obtain : ???? = ??− 1 ?? 1−?? 1−??????,? Setting learning mean : ????,? = ??− learning variance : ????,? = ??? =1− ??−? ?? 1− ?? • Objective : Approximate ????−1|?? as close as the target ? ??−1|?? • Target distribution : ? ??−1|?? = ? • Approximated distribution : ????−1|?? = ? • ? : Learnable parameters by neural network ??−1; ????,?0, ??? ??−1; ????,? , ????,?

Diffusion Probabilistic Models • The comparison between the target mean and the approximated mean can be done using a mean squared error (MSE): • Target distribution : ? ??−1|?? = ? ??−1; ????,?0,??? • Approximated distribution : ????−1|?? = ? ??−1; ????,? , ????,? 1 ?? 1−?? 1−??????,? Learning mean : ????,? = ??− Learning variance : ????,? = ??? =1− ??−? ?? 1− ?? ??= ???0+ 1 − ????

Diffusion Probabilistic Models ??????: • Simplified loss function ?? ??= ???0+ 1 − ????

Diffusion Probabilistic Models • DPM application : image, audio , denoising, higher resolution image, but higher computation 2+ ?? ??+ ??− 2 ???? 0.5 • • ??? ?,? = ??− ?? ???? : consider spatial features with FID 2

Thanks for Listening

Comparative Analysis of Generative Adversarial Networks (GAN) and Diffusion Probabilistic Models (DPM)