Tuesday, July 4, 2023

Introduction to Image Generation

 

Introduction to Image Generation

Note: The below is from my learning from https://www.cloudskillsboost.google/course_templates/541 (Introduction to Image Generation).

While many approaches have been implemented for image generation, the following model families have looked more promising over time:

  • Variational auto encoders
    • Encode images to a compressed size and then decode back to the original size while learning the distribution of the data itself.
  • Generative adversarial models (GANs)
    • Pit two neural networks against each other.
    • One neural network, the generator creates images.
    • The other neural network, the discriminator predicts, if the image is real or fake.
    • Over time, the discriminator gets better and better at distinguishing between real and fake, and the generator gets better and better at creating real looking fakes.
  • Auto regressive models
    • Generate images by treating an image as a sequence of pixels
Lets discuss Diffusion Models.

Unconditioned diffusion models, where models have no additional input or instruction, can be trained from images of a specific thing, such as faces and it will learn to generate new images of that thing.

Conditioned diffusion models like text to image where we can generate 
  • An image from a text prompt 
    • e.g. Batman with a cat face
  • Image-in painting
    • e.g. Remove the apple from the image
  • Text guided image to image where we can remove or add things, we can edit the image itself.
    • e.g. Horse in space with glowing headbands

How do Diffusion models work?

The idea is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. Really, this is going to be adding noise iteratively to an image. We then learn a reverse diffusion process that restores structure in the data, yielding a highly flexible and tractable generative model of the data. 

In other words, we can add noise to an image iteratively, and we can then train a model that learns how to de-noise an image, thus generating novel images.

Process:
  1. We start with a large dataset of images.
  2. For an image we add some noise in an iterative way.
  3. So if we do this over and over, iteratively adding more noise, we need to think about how many times do we perform that operation.
  4. By the end of it, we should reach a phase of pure noise.
  5. By this point, all structure in the initial image is completely gone
  6. The challenging part is how do we go from a noisy image to a relatively less noisy image?
  7. This process is called the "reverse diffusion process"
  8. Do note, that every step that we add noise, we also learn the reverse diffusion process. That is, we train a machine learning model that takes in as input the noisy image and predicts the noise that's been added to it.
  9. Over time, after seeing enough examples, this model gets very very good at removing noise from images.
  10. How do we generate images with it?
    1. We can just start with pure, absolute noise and send that noise through our model that is trained. 
    2. We then take the output, the predicted noise and subtract it from the initial noise. 
    3. And if we do that over and over and over again, we end up with a generated image.




Generative AI Studio

What is Generative AI?

It is a type of artificial intelligence that generates content for you.

Note: The below is from my learning from https://www.cloudskillsboost.google/course_templates/556 (Introduction to Generative AI Studio).

What type of content?

Any type of content like text, images etc.

How does it generate this content?

It generates this content from so many existing content already available. The process of learning from these existing content is called training. Through this it creates a foundational model. e.g of foundational model is LLM (Large language model). The foundational model can be used to generate content and and tasks such as content extraction.

One can add new data sets to the above foundational model for a specific task and thus creating a new model.

How can I create a new model from a foundational mode? Is it easy?

Using Google Cloud tool called Vertex AI. Vertex AI is an end-to-end ML development platform on Google Cloud that helps you build, deploy, and manage machine learning models.

What is Generative AI Studio?

Generative AI Studio allows a user to quickly prototype and customize generative AI models with no code or low code. Generative AI Studio supports language, vision, and speech.

Language - Tune Language models

Vision - Generate images based on prompts

Speech - Generate text from speech or vice versa.

Best practices for prompt design

What is a prompt? 

A prompt is your text input that you pass to the model

Best practices for prompt design:

  • Be concise
  • Be specific and well-defined
  • Ask one task at a time
  • Ask to classify instead of generating (e.g. "is X better to learn?" instead of "what is better to learn"?)
  • Include examples (Adding examples tends to yield better results)
There are a few model parameters once can experiment with to try to improve the quality of responses:
  1. Temperature
  2. Top P 
  3. Top K
Temperature is a number used to tune the degree of randomness.
Low temperature means to select the words that are highly possible and more predictable.
High temperature implies more random, unexpected and some may say "creative" responses.

Top K lets the model randomly return a word from the top K number of words in terms of possibility. For example, top 2 means you get a random word from the top 2 possible words.

Top P allows the model to randomly return a word from the top P probability of words.

Conversations

Before we try to create conversations, we need to specify the conversation context.
Context instructs how the model should respond. 
We can add words that the conversation can or cannot use. Same goes for the topic to focus on or avoid.

Tune a Language Model

Prompt design allows for fast experimentation and customization.
However, we have to understand that changes in the prompt wordings can impact the model significantly.  Hence, we look to tune the model.