Introduction to Image Generation
Note: The below is from my learning from https://www.cloudskillsboost.google/course_templates/541 (Introduction to Image Generation).
While many approaches have been implemented for image generation, the following model families have looked more promising over time:
- Variational auto encoders
- Encode images to a compressed size and then decode back to the original size while learning the distribution of the data itself.
- Generative adversarial models (GANs)
- Pit two neural networks against each other.
- One neural network, the generator creates images.
- The other neural network, the discriminator predicts, if the image is real or fake.
- Over time, the discriminator gets better and better at distinguishing between real and fake, and the generator gets better and better at creating real looking fakes.
- Auto regressive models
- Generate images by treating an image as a sequence of pixels
- An image from a text prompt
- e.g. Batman with a cat face
- Image-in painting
- e.g. Remove the apple from the image
- Text guided image to image where we can remove or add things, we can edit the image itself.
- e.g. Horse in space with glowing headbands
- We start with a large dataset of images.
- For an image we add some noise in an iterative way.
- So if we do this over and over, iteratively adding more noise, we need to think about how many times do we perform that operation.
- By the end of it, we should reach a phase of pure noise.
- By this point, all structure in the initial image is completely gone
- The challenging part is how do we go from a noisy image to a relatively less noisy image?
- This process is called the "reverse diffusion process"
- Do note, that every step that we add noise, we also learn the reverse diffusion process. That is, we train a machine learning model that takes in as input the noisy image and predicts the noise that's been added to it.
- Over time, after seeing enough examples, this model gets very very good at removing noise from images.
- How do we generate images with it?
- We can just start with pure, absolute noise and send that noise through our model that is trained.
- We then take the output, the predicted noise and subtract it from the initial noise.
- And if we do that over and over and over again, we end up with a generated image.