Diffusion models or diffusion probabilistic models are generative models, they generate data similar to what they have trained on . It consist of three componenets, forward proces which destroys the training images by adding Gaussain noises, reverse process which learns to recover the data by reversing the noising process and sampling procedure.
We start with the image without noises, we gradually adding Gaussian noise to the image from t=1 to t=T given B the variance schedule which is how much noise we will add at each step. This called forward process or diffusion process .
The reverse process is what we want the model to learn .
So during training process, we pick the image , we add B of noise, the output is feeded to the model to undo it, we add again some noise to the previous noise image , we give it to the model to undo it,so we gradually remove noises until we end back at what the network think the original image was .
So the problem is how we do the “undo it” process ? in diffusion models, we use encoder network where we feed the noisy image at given t : xt, to the network to predict the noise that where added , we subsctract xt with the prediction to get the image back as it was at t=0, we add back the guassian noises, we feed it to the network and we do this for T times .
What about the involvement of text in diffusion models ?
There are a lot of how you can represent the text with in a diffusion models but one of the best one is called CLIP (Contrastive Language-Image Pre-Training) by OpenAI.
How CLIP works ?
CLIP is neural network that represent the correct pairing of text and image in the space by combining Vision Transformers to encode the image and Transformers to encode the text .
Then ?
The embeded text is feeding to the network along with the noisy image .
AND that is how diffusion models works .
This is just the surface of diffusion models, to dive deep into it i recommand these resources:
- 2006.11239
- https://www.assemblyai.com/blog/diffusion-models-for-machine-learning-introduction
- Diffusion model — Wikipedia
- https://www.youtube.com/watch?v=1CIpzeNxIhU&pp=ygUeZGlmZnVzaW9uIG1vZGVscyBjb21wdXRlcnBoaWxl
- https://www.youtube.com/watch?v=-lz30by8-sU&pp=ygUeZGlmZnVzaW9uIG1vZGVscyBjb21wdXRlcnBoaWxl