Introduction to Image Generation
Note: The below is from my learning from https://www.cloudskillsboost.google/course_templates/541 (Introduction to Image Generation).
While many approaches have been implemented for image generation, the following model families have looked more promising over time:
- Variational auto encoders
- Encode images to a compressed size and then decode back to the original size while learning the distribution of the data itself.
- Generative adversarial models (GANs)
- Pit two neural networks against each other.
- One neural network, the generator creates images.
- The other neural network, the discriminator predicts, if the image is real or fake.
- Over time, the discriminator gets better and better at distinguishing between real and fake, and the generator gets better and better at creating real looking fakes.
- Auto regressive models
- Generate images by treating an image as a sequence of pixels
Unconditioned diffusion models, where models have no additional input or instruction, can be trained from images of a specific thing, such as faces and it will learn to generate new images of that thing.
Conditioned diffusion models like text to image where we can generate
- An image from a text prompt
- e.g. Batman with a cat face
- Image-in painting
- e.g. Remove the apple from the image
- Text guided image to image where we can remove or add things, we can edit the image itself.
- e.g. Horse in space with glowing headbands
How do Diffusion models work?
The idea is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. Really, this is going to be adding noise iteratively to an image. We then learn a reverse diffusion process that restores structure in the data, yielding a highly flexible and tractable generative model of the data.
In other words, we can add noise to an image iteratively, and we can then train a model that learns how to de-noise an image, thus generating novel images.
Process:
- We start with a large dataset of images.
- For an image we add some noise in an iterative way.
- So if we do this over and over, iteratively adding more noise, we need to think about how many times do we perform that operation.
- By the end of it, we should reach a phase of pure noise.
- By this point, all structure in the initial image is completely gone
- The challenging part is how do we go from a noisy image to a relatively less noisy image?
- This process is called the "reverse diffusion process"
- Do note, that every step that we add noise, we also learn the reverse diffusion process. That is, we train a machine learning model that takes in as input the noisy image and predicts the noise that's been added to it.
- Over time, after seeing enough examples, this model gets very very good at removing noise from images.
- How do we generate images with it?
- We can just start with pure, absolute noise and send that noise through our model that is trained.
- We then take the output, the predicted noise and subtract it from the initial noise.
- And if we do that over and over and over again, we end up with a generated image.
No comments:
Post a Comment