Introduction to Conditional GANs (cGANs) & Controllable Generation
In our previous articles on WGANs and DCGANs, we discussed several techniques to improve the GAN training process.
Even with these improvements, however, we still don't have much control over what the GAN actually generates—this is referred to unconditional generation.
In this article, we'll discuss how to tell the model which specific items you'd like it to generate, otherwise known as conditional generation.
We'll also look at controllable generation, which has the same result of controlling the GAN's output, although this is accomplished by adapting the inputs to the model without changing the model's weights.
This article is based on notes from Week 4 from the first course in this Generative Adversarial Networks (GANs) Specialization and is organized as follows:
- Intuition of Conditional Generation
- Conditional Generation: Inputs
- Introduction to Controllable Generation
- Vector Algebra in Z-Space
- Challenges with Controllable Generation
- Classifier Gradients
Stay up to date with AI
We're an independent group of machine learning engineers, quantitative analysts, and quantum computing enthusiasts. Subscribe to our newsletter and never miss our articles, latest news, etc.
Intuition of Conditional Generation
In this section we'll discuss the differences and intuition of conditional vs. unconditional generation.
Unconditional Generation
With unconditional generation, you get outputs from a random class.
For example, you can think of unconditional generation as a gumball machine, where the coin you put in is the noise vector that's fed into the GAN and the random gumball color you get is the output from the generator.
With unconditional GANs the training dataset does not need to be labelled.
Conditional Generation
Conditional generation, on the other hand, allows you to specify an example from a particular class that you want to be outputted.
Intuitively, this is more like a vending machine where you add a coin, enter a code for the item, and get what you ask for.
With a conditional GAN, you get a random example from the class you specify.
With conditional generation, you have to train the GAN with labeled datasets.
Conditional Generation: Inputs
In order to produce examples from a chosen class, we need to have a labelled dataset and pass the class information to both the generator and discriminator during training.
In this section, we'll discuss how you can pass this conditional information as an input.
Recall that with traditional GANs, we have a noise vector as input.
For conditional GANs, we also need a vector to tell the generator which class the examples should come from.
Typically this is done with a one-hot vector, meaning there are zeros in every position except for the position of the class we want to generate.
We still need to include the noise vector to produce randomness, except now it's producing diversity from within the class we choose.
The input to a conditional GAN is a concatenated vector of both noise and the one-hot class vector.
In addition, the discriminator also needs to be given the class information.
Now, the discriminator will be determining if the examples are either real or fake representations of that particular class.
More specifically, the input of the discriminator works as follows:
- The image is fed in as 3 different channels—RGB—or one channel if it's grayscale
- The one-hot class information is fed as additional channels in which all the channels take on values of all zeros if its not the chosen class, whereas the chosen class will take on values of all ones.
In contrast to the one-hot vector in the generator's input, these are typical much larger matrices where each channel is full of zeros at every position that's not the chosen class.
Introduction to Controllable Generation
Controllable generation is another way to control the output of GANs after it has been trained.
Whereas conditional generation uses labels during training, controllable generation focuses on controlling the features that you want in the output examples.
This can be done by adjusting the input noise vector $Z$ that is fed into the generator after it has been trained.
Below is a summary of controllable vs. conditional generation:
Controllable Generation
- You're able to generate examples with the features that you specify
- The training dataset doesn't need to be labelled
- It works by adjusting the input noise vector $Z$ that's fed into the generator
Conditional Generation
- You're able to generate examples with the classes you specify
- The training dataset needs to be labeled
- It works by adjusting the class vector fed into the generator
Next, we'll look at exactly how we can adjust the noise vector $Z$.
Vector Algebra in Z-Space
In this section, we'll look at how to manipulate the noise vector $Z$ in order to achieve controllable generation.
Controllable generation is somewhat similar to interpolation.
With interpolation, you get intermediate examples between two generated observations.
These intermediate examples between to two targets by manipulating the inputs from $Z$-space, which is the same idea behind controllable generation.
In order to get intermediate values between two images, for example, you can make an interpolation between their two input vectors $v_1$ and $v_2$ in the $Z$-space.
Controllable generation also uses changes in $Z$-space and makes use of how adjustments to the noise vector are reflected in the output from the generator.
Differences in the features generated, for example different hair colors, occur due to changes in the direction that you have to move in $Z$-space to modify the features of the image.
With controllable generation, the goal is the find the directions in $Z$-space that will create these different features that you want.
In the next section, we'll look at how you can find these directions $d$ in $Z$-space to control the output of the GAN.
If we have an original image output of $g(v_1)$, we can get a new controlled output with $g(v_1 + d)$.
Challenges with Controllable Generation
Controllable generation allows you to choose the features and output of a GAN, although it does have several challenges.
Two of these challenges include feature correlation and the alpha space in $Z$-space engagement.
Feature Correlation
If certain features in a dataset have a high correlation in a dataset, it becomes difficult to control specific features without changing the closely correlated ones.
For example, let's say you have a dataset of face images and want to add facial hair to an image of a woman, it's likely that you'll end up modifying more features as this feature is highly correlated with a male's face.
This isn't desirable because you want to be able to find the directions where you can change just one feature of the image.
Z-Space Entanglement
Another challenge with controllable generation is referred to as entanglement in Z-space.
When the $Z$-space is entangled, this means movement in different directions has an effect on multiple features in the output simultaneously.
Even if these features aren't correlated, an entangled $Z$-space results in a single feature change modifying more than one feature in the output.
Entanglement happens commonly if the number of dimensions in $Z$-space isn't large enough.
Again, this isn't desirable as it makes it much more difficult to control the GAN's output.
Classifier Gradients
As mentioned, controllable generation works by changing the direction of $Z$-space that corresponds with a desired feature.
In this section, we'll look at one technique you can use to find this direction with the gradient of trained classifiers.
For example, if we want to add sunglasses to an image of a face, we can used a trained classifier that identifies if a personal has that feature.
To do this, we can take a batch of noise vector $Z$ that goes through the generator.
We then pass this image through a classifier, in this case a sunglasses classifier, which will tell us if the output has that feature.
We the use this information to modify the $Z$ vectors, without modifying the weights of the generator at all.
To do so, we modify the $Z$ vectors by moving in the direction of the gradient with the costs that will penalize the model for images classified as not having sunglasses. We then repeat this process until the images are classified with the desired feature.
The downside with this method is that we need a pre-trained classifier that can detect the desired feature, which may not always be readily available.
In summary, classifiers can be used to find directions in the $Z$ space that correspond with features. In order to find these directions, the updates are done on the noise vector instead of modifying the generator's weights itself.
Summary: Conditional GANs and Controllable Generations
To summarize, conditional GANs allow you to specify the class you want the model to output.
Controllable generation allows you to specify the features generated.
With conditional generation, the training dataset needs to be labelled, whereas with controllable generation it does not.
Controllable generation works by adjusting the input noise vector $Z$.
Conditional generation, on the other hand, works by adjusting the class vector that is fed into the generator.