Introduction to Image Generation

Tuesday, July 4, 2023

Introduction to Image Generation

Introduction to Image Generation

Note: The below is from my learning from https://www.cloudskillsboost.google/course_templates/541 (Introduction to Image Generation).

While many approaches have been implemented for image generation, the following model families have looked more promising over time:

Variational auto encoders

Encode images to a compressed size and then decode back to the original size while learning the distribution of the data itself.

Generative adversarial models (GANs)

Pit two neural networks against each other.

One neural network, the generator creates images.

The other neural network, the discriminator predicts, if the image is real or fake.

Over time, the discriminator gets better and better at distinguishing between real and fake, and the generator gets better and better at creating real looking fakes.

Auto regressive models

Generate images by treating an image as a sequence of pixels

Lets discuss Diffusion Models.

Unconditioned diffusion models, where models have no additional input or instruction, can be trained from images of a specific thing, such as faces and it will learn to generate new images of that thing.

Conditioned diffusion models like text to image where we can generate

An image from a text prompt

e.g. Batman with a cat face

Image-in painting

e.g. Remove the apple from the image

Text guided image to image where we can remove or add things, we can edit the image itself.

e.g. Horse in space with glowing headbands

How do Diffusion models work?

The idea is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. Really, this is going to be adding noise iteratively to an image. We then learn a reverse diffusion process that restores structure in the data, yielding a highly flexible and tractable generative model of the data.

In other words, we can add noise to an image iteratively, and we can then train a model that learns how to de-noise an image, thus generating novel images.

Process:

We start with a large dataset of images.
For an image we add some noise in an iterative way.
So if we do this over and over, iteratively adding more noise, we need to think about how many times do we perform that operation.
By the end of it, we should reach a phase of pure noise.
By this point, all structure in the initial image is completely gone
The challenging part is how do we go from a noisy image to a relatively less noisy image?
This process is called the "reverse diffusion process"
Do note, that every step that we add noise, we also learn the reverse diffusion process. That is, we train a machine learning model that takes in as input the noisy image and predicts the noise that's been added to it.
Over time, after seeing enough examples, this model gets very very good at removing noise from images.
How do we generate images with it?

We can just start with pure, absolute noise and send that noise through our model that is trained.
We then take the output, the predicted noise and subtract it from the initial noise.
And if we do that over and over and over again, we end up with a generated image.

My Learning Cafe

Tuesday, July 4, 2023

Introduction to Image Generation

No comments:

Post a Comment