GPT-3 Fine Tuning: Key Concepts & Use Cases

In this guide, we'll discuss what GPT-3 fine-tuning is, key concepts, how to prepare a fine-tuning dataset, and more. Specifically, we'll discuss:

  • What is GPT-3 Fine Tuning?
  • Zero-Shot Learning
  • Few shot learning
  • What's the difference between embedding and fine-tuning GPT-3?
  • Steps to fine tune GPT-3
  • Preparing a dataset for fine-tuning
  • GPT-3 fine-tuning use cases
💡
If you want to learn more about GPT 3.5 fine tuning, check out the articles below.
Getting Started with GPT 3.5 Turbo Fine Tuning
GPT 3.5 Turbo fine tuning has arrived. In this guide we discuss how to get started with fine tuning, including use cases & examples.
GPT 3.5 Fine Tuning for Brand Tone of Voice
In this guide, we discuss how to fine tune GPT 3.5 turbo to set a distinct brand style and tone of voice.

What is GPT-3 Fine Tuning?

As you may know by now, the base GPT-3 model was trained on nearly the entire internet worth of text data, or reportedly 45TB of text data.

When you ask GPT-3 a general question, the base model can often come up with a plausible and relatively intelligent response. For many use cases, the base model will do just fine, but there are a number of areas where it will fail.

For example, with factual question answering, OpenAI highlights that the base model often provides incorrect answers:

Base GPT-3 models do a good job at answering questions when the answer is contained within the paragraph, however if the answer isn't contained, the base models tend to try their best to answer anyway, often leading to confabulated answers.

Before we get into GPT-3 fine-tuning in more detail, let's first review several key concepts:

  • Fine-tuning
  • Few shot learning
  • One shot learning
  • Zero-shot learning

Fine Tuning

Fine-tuning is the process of training a pre-trained model (i.e base GPT-3) on a new task or dataset, where only the last layers of the model are re-trained while keeping the earlier layers fixed.

As OpenAI writes in their paper introducing GPT-3 called "Language Models are Few-Shot Learners":

Fine-Tuning (FT) has been the most common approach in recent years, and involves updating the weights of a pre-trained model by training on a supervised dataset specific to the desired task. Typically thousands to hundreds of thousands of labeled examples are used. The main advantage of fine-tuning is strong performance on many benchmarks.

Few-Shot Learning

Few-shot learning refers to the process of providing a model a small number of examples, typically only a few examples, at the time of inference (i.e. included in your prompt). GPT-3 is then expected to generalize to new, unseen examples. It is similar to fine-tuning but the amount of labeled data is limited.

As the paper writes:

Few-Shot (FS) is the term we will use in this work to refer to the setting where the model is given a few demonstrations of the task at inference time as conditioning, but no weight updates are allowed.

One-Shot Learning

One-shot learning is similar to few-shot learning, except only one example is provided to the base model at the time of inference.

One-Shot (1S) is the same as few-shot except that only one demonstration is allowed, in addition to a natural language description of the task

Zero-Shot Learning

As you can guess, zero-shot learning is a technique where a model is given a task without any training examples, and only a natural language prompt is provided.

Zero-Shot (0S) is the same as one-shot except that no demonstrations are allowed, and the model is only given a natural language instruction describing the task.

For certain tasks you can improve the base model performance by simply providing a few examples in your prompt, but if you really want to see significant model improvements, fine-tuning with hundreds or thousands of training examples is the way to go.

As the paper highlights:

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task.

Another reason that fine-tuning is so powerful is that OpenAI models have token limits, which means we can't just include all the context we want in a prompt.

As OpenAI writes, tokens can be thought of as pieces of words, and before the OpenAI API processes a prompt, the input is broken down into tokens:

Depending on the model used, requests can use up to 4097 tokens shared between prompt and completion.

Another issue with few-shot learning is that we have to pay for each example we include in the prompt as it increases the token size. Instead of paying each time we want to provide additional context or training examples, this means by fine-tuning, not only can we improve model performance, but can also decrease overall API costs.

As OpenAI highlights in its documentation, the key benefits of fine-tuning include:

  1. Higher quality results than prompt design
  2. Ability to train on more examples than can fit in a prompt
  3. Token savings due to shorter prompts
  4. Lower latency requests

In short, what we're doing with fine-tuning is uploading a set of examples that includes both prompts and completions in order to enhance the GPT-3 base model for a specific use case.

<div class="subscription-box" style="--color-accent: #cc60e3;">
  <h3 class="subscription-form-title">Stay up to date with AI</h3>
  <form class="subscription-form" data-members-form="signup"> 

    <p class="subscription-form-description">We're an independent group of machine learning engineers, quantitative analysts, and quantum computing enthusiasts. Subscribe to our newsletter and never miss our articles, latest news, etc.</p>
    <div class="subscribe-box">
      <input data-members-email type="email" placeholder="Your email address" required/> 
      <button class="btn" type="submit">Signup</button>
    </div>
    <div class="success-message">Great! Check your inbox and click the link.</div> 
    <div class="error-message">Sorry, something went wrong. Please try again.</div> 
  </form>
</div>

What's the difference between embedding and fine-tuning GPT-3?

If you've read our previous articles on GPT-3 fine tuning, you know that we've made use of the Embeddings API to train GPT-3 on an additional body of knowledge that it doesn't have access to (it was only trained until 2021) such as earnings call transcripts, IPO prospectuses, recent crypto events, and so on.

You may be wondering what the difference is between embeddings and fine-tuning.

Embeddings

Using the Embeddings API and fine-tuning are both techniques to train GPT-3 on separate data, but they serve different purposes and involve different types of training methods.

In the field of natural language processing, embeddings are a way to represent words, phrases, or documents as numerical vectors that capture their meaning and context.

With these learned embeddings from an additional body of knowledge, we can then construct prompts that provide additional context and respond based on that input.

In short, if you have a large body of text, for example, you want to train GPT-3 on a textbook, legal documents, or any other additional body of knowledge, the Embeddings API is the way to go.

Fine-tuning

Conversely, if you don't care as much about specific facts for your use case, and instead want to, for example, train GPT-3 to write tweets in your own personal style, fine-tuning is the way to go.

With fine-tuning, you're training GPT-3 on a specific structure, pattern, or style of language based on the set of examples you provide. In other words, fine tuning can be thought of as retraining the base GPT-3 model on a new set of patterns, rules or templates.

After you've fine-tuned a new GPT-3 model, you can then call it with the Completions API and it will use this new set of structural learnings to provide a response.

In some cases, you can also combine these two: first, use the Embeddings API to learn an additional knowledge base, and then use fine-tuning to respond in a certain way.

Now that we've covered a few of the key concepts to fine-tune GPT-3, let's look at the steps to do so.

Steps to Fine GPT-3

At a high level, the steps we need to take to fine-tune GPT-3 include:

  • Prepare and upload training data in JSONL format
  • Train a new fine-tuned model
  • Save and use your fine-tuned model for responses

Let's dive into more details about the first step: preparing a fine-tuning dataset.

Preparing a dataset for GPT-3 fine-tuning

In order to prepare a dataset for fine-tuning, we'll be using the JSONL file in the following format.

We can also use OpenAI's CLI data preparation tool to easily convert CSV, TSV, XLSX, or JSON files into JSONL in the following format:

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
...

As we can see, each input provides a single input or prompt and the associated ideal completion text. As OpenAI writes:

While prompts for base models often consist of multiple examples ("few-shot learning"), for fine-tuning, each training example generally consists of a single input example and its associated output, without the need to give detailed instructions or include multiple examples in the same prompt.

In order for fine-tuning to work properly, a sufficient training set typically consists of at least 200+ examples, and:

...increasing the number of samples is one of the best ways to improve performance.

Training data can also be created from real-world data such as emails, sales copy, or whatever else you want to fine-tune on. It's important to note that these examples should be reviewed by a human operator and checked for accuracy, structure, and so on.

One thing you may be thinking it's that it will take a long time to create a training file with 200+ examples. To speed up this process, you can also always use GPT-3 to generate fine-tuning data itself and then have a human simply review for accuracy, but we'll save that for another article.

Creating a Fine Tuned Model

After preparing your fine-tuning dataset, in order to create and save a fine-tuned model, the steps we need to take include:

  • Install OpenAI and set your API key
# install openai
pip install --upgrade openai
export OPENAI_API_KEY="<OPENAI_API_KEY>"

Next, we can use the CLI data preparation tool if we need to change our datset to JSONL:

openai tools fine_tunes.prepare_data -f <LOCAL_FILE>

Next, create fine-tuned model as follows, ...where BASE_MODEL refers can be any of ada, babbage, curie, or davinci:

openai api fine_tunes.create -t <TRAIN_FILE_ID_OR_PATH> -m <BASE_MODEL>

The command above does a few things:

  • First, it uploads your training file using the file API
  • It then creates a fine-tuned job
  • Lastly, it streams events until the job is done, this often takes minutes to several hours if there are many jobs in the queue or your dataset is large

When the job is complete, you'll see the name of your fine-tuned model in your command line.

Using a fine-tuned model

After you've created your fine-tuned model and have its name, you can then simply specify this model name as a parameter to the Completions API:

import openai
openai.Completion.create(
    model=FINE_TUNED_MODEL,
    prompt=YOUR_PROMPT)

You will also see your fine-tuned models in the Playground so you can test it without having to write code.

GPT-3: Fine-tuning Use Cases

There are a number of examples where fine-tuning can be useful, a few of which provided in this OpenAI course include:

  • Fine-tuning for classification: i.e. classifying the sentiment of a tweet
  • Fine-tuning to summarize: i.e summarizing and writing copy based on Wikipedia or product pages
  • Fine-tuning to expand: i.e. writing sales copy based on certain properties of an item
  • Fine-tuning to extract: i.e. pulling named entities from emails or any other text

Let's look at a few examples of these.

Use Case: analyzing Tweet sentiment

One of the easiest tasks we can do with fine-tuning is classification, let's look at how we can classify tweets based on sentiment.

For this example, we'll want to use at least 100 examples per category, so positive negative sentiment will need 200+ examples. In this case, our fine-tuning dataset will include the tweet as the prompt , and the completion will be the associated sentiment.

For this simple example, the reality is the base Davinci GPT-3 model can already likely do this on its own without fine-tuning...but if we want to process a large volume of tweets the token cost would be quite high.

Instead, we could train the Ada model so we can perform the task at a lower cost with our fine-tuned model.

In order to format our data, we'll add the following --> to our tweets and classify the completion as follows.

{"prompt": "<tweet text> -->", "completion": " <positive/negative>"}

For example:

{"prompt":"Overjoyed with the new iPhone! ->", "completion":" positive"}
{"prompt":"@lakers disappoint for a third straight night https://t.co/38EFe43 ->", "completion":" negative"}

It's important to note we should always start a completion with a space.

One of the reasons to add the --> and the completion space for each is that it specifies a new pattern that can be followed and replicated, for example, if used in production we can recreate these inputs to generate similar ouput.

Use Case: Company & slogan matching

Another example of classification fine-tuning is determining if a company and its slogan match. Here's how we'd prepare the training file for a use case like that:

{"prompt": "Company: <comapny>\nProduct: <product>\nSlogan<slogan>\n\nCorrect:", "completion": " <yes/no>"}

Whatever the use case, we need to make sure the prompts match the format that new input data will arrive in, which may require some additional data preprocessing.

Use Case: Conditional Generation

Conditional generation refers to the case where the output needs to be generated based on the input or prompt.

Conditional generation can be used to take input and perform the following tasks:

  • Paraphrase
  • Summarize
  • Extract entities
  • Write creatively
  • Answer simple questions

For example, we could write a full sales letter based on a product description with the following format:

{"product name": "Product description>\n\n###\n\n, "completion": " <full sales copy> END"}

Here we're giving examples of the type of sales we want for the trained model. The reason we're adding the END at the end of the completion so we can use it as a stop setting to stop GPT-3 from continuing with extra paragraphs of text.

You can find several other examples of fine-tuning use cases in the OpenAI docs here.

Summary: GPT-3 Fine Tuning

In summary, GPT-3 fine-tuning is the process of training a pre-trained model (i.e base GPT-3) on a new task or dataset, where only the last layers of the model are re-trained while keeping the earlier layers fixed.

Zero-shot learning, Few-shot learning, One-shot learning, and fine-tuning are all techniques used in natural language processing, but they differ in the amount of data and prior knowledge available for training.

For certain tasks, providing a few examples in your prompt can improve the base model performance, but fine-tuning with hundreds or thousands of training examples is the way to go for significant model improvements.

That said, if you want to train GPT-3 on a large body of additional knowledge, for example, a textbook or legal documents, the Embeddings API is well-suited for these tasks.

In summary, fine tuning is a powerful way to improve performance and decrease costs as you're able to piggyback off the massive amounts of data GPT-3 is already trained on and with a few hundred training examples, it can quickly adapt to a new task.