Getting Started with OpenAI's Assistants API

OpenAI has once again changed the AI industry at the releases at their DevDay on November 6th. While there are many new features to try out, in my opinion the Assistants API was the biggest release:

An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries.
The Assistants API currently supports three types of tools: Code Interpreter, Retrieval, and Function calling.

As Sam Altman highlighted at DevDay, building these agentic features was possible before, but often required significant engineering, the use of third-party libraries, and frankly weren't always that reliable. Now, by combining code interpreter, retrieval, and function calling, we can build AI agents directly with the GPT API.

In this guide, we'll look at how to get started based on with this new capability based on the Assistants API documentation, including:

  • Overview of the Assistants API
  • Assistant 1: Code Interpreter
  • Assistant 2: Knowledge Retrieval
You can access the premium version of this tutorial with a video walkthrough and code below.
Getting Started with the Assistants API: MLQ Academy
In this video tutorial, we’ll walk through how to get started with OpenAI’s Assistants API.

Overview of the Assistants API

Before we just into the code, let's first look at the high-level overview of building on the Assistants API as there are several new components.

source

First, let's start with the steps and definitions to create an Assistant:

  1. Defining an Assistant: An Assistant is an purpose-built AI that uses models, instructions, and uses tools.
  2. Creating a Thread: A Thread is a conversation flow initiated by a user, to which messages can be added, creating an interactive session.
  3. Adding Messages: Messages contain the text input by the user and can include text, files, and images.
  4. Running the Assistant: Finally, we run the Assistant to process the Thread, call certain tools if necessary, and generate the appropriate response.

Assistant 1: Code Interpreter

Now that we have an overview of the steps & definitions, let's build a simple Assistant that uses Code Interpreter.

Before we build the Assistant, in order to use these new features we import OpenAI slightly differently than before:

from openai import OpenAI
client = OpenAI()

Step 1: Creating an Assistant

In this example, we'll build a machine learning tutor Assistant with the following instruction:

You are an assistant that helps with machine learning coding problems. Write, run, and explain code to answer questions.

You can also see we've got the code_interpreter tool enabled:

assistant = client.beta.assistants.create(
    name="ML Code Helper",
    instructions="You are an assistant that helps with machine learning coding problems. Write, run, and explain code to answer questions.",
    tools=[{"type": "code_interpreter"}],
    model="gpt-4-1106-preview"
)

Step 2: Create a Thread

Next, let's create create a Thread for the Assistant as follows:

thread = openai.beta.threads.create()

The nice thing about Threads is that they don't have a size limit, meaning you can pass as many Messages as you want.

If you recall with the previous GPT-4 API, creating "conversations" was accomplished by appending user and assistants responses onto each other. This not only consumed significantly more API costs, but you also quickly ran out of token limit space after a few conversations...but now:

The API will ensure that requests to the model fit within the maximum context window, using relevant optimization techniques such as truncation.

If we print out the thread, we can see it's empty right now, so let's add messages to it.

Step 3: Adding Messages to a Thread

We can now add Messages to our Thread, in this case I'll ask a relatively common question that a student might ask about machine learning:

When I try to calculate the cost for my linear regression, I get a 'ValueError: operands could not be broadcast together with shapes (100,) (100,1)'. Here's the part where it fails: cost = (1/(2*m)) * np.sum(np.square(y_pred - y))
# User is asking for help with their Python code for a linear regression cost function
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="When I try to calculate the cost for my linear regression, I get a 'ValueError: operands could not be broadcast together with shapes (100,) (100,1)'. Here's the part where it fails: `cost = (1/(2*m)) * np.sum(np.square(y_pred - y))`. Can you help me figure out why this is happening?"
)

We can now see the a new ThreadMessage object with the user's question:

Step 4: Run the Assistant

Now we're ready to create a Run, which will run the Assistant on the Thread to trigger responses and automatically call relevant tools.

This makes the Assistant read the Thread and decide whether to call tools or simply use the model to best answer the user query

After deciding what tools to use, the assistant appends Messages to the thread with the role="assistant" .

run = client.beta.threads.runs.create(
  thread_id=thread.id,
  assistant_id=assistant.id,
  instructions="Please explain the solution in a simple way so the user can learn from it."
)

We can see the status of the Run is initally queued and then goes through the Run lifecycle:

Source

Step 5: Display the Assistant's Response

Next, we can retrieve the Run to check if it's completed as follows:

run = client.beta.threads.runs.retrieve(
  thread_id=thread.id,
  run_id=run.id
)

After it's completed, let's go and retrieve the Assistant messages that has been appended to the thread:

messages = openai.beta.threads.messages.list(
  thread_id=thread.id
)

Next, let's display the role and content of the Assistant message:

assistant_response = messages.data[0].content[0].text.value

Assistant Response

You can fix the broadcasting error by ensuring that both y_pred and y have the same shape before you perform the subtraction. You can do this by either reshaping y_pred to be a two-dimensional array with one column or by flattening y to be a one-dimensional array.

Here are two examples that you can use to fix the error:

Option 1: Reshape y_pred to match y's shape

y_pred = y_pred.reshape(-1, 1) # Now y_pred shape will be (100, 1), matching y
cost = (1 / (2 * m)) * np.sum(np.square(y_pred - y))

Option 2: Flatten y to match y_pred's shape

y = y.flatten() # Now y shape will be (100,), matching y_pred
cost = (1 / (2 * m)) * np.sum(np.square(y_pred - y))

Choose the appropriate option based on whether y_pred or y is closer to the desired shape in your context. If y_pred is the output of a machine learning model, it might output predictions in either shape, and you should adjust the shape of y accordingly.

Let's apply Option 2 and calculate the cost using the corrected shapes, as y being a one-dimensional array is more common in practice.

Not bad. Alright now that we've got the Code Interpreter tool working, let's look at the Knowledge Retrieval Assistant.


Assistant 2: Knowledge Retrieval

As highlighted in the Knowledge Retrieval tool documentation:

Retrieval augments the Assistant with knowledge from outside its model, such as proprietary product information or documents provided by your users.

If you've seen our previous tutorials on embeddings and vector search, now this is all done natively within the GPT API.

For this example, let's start with the classic AI app: chatting with PDFs.

Similar to our first assistant, we can follow these steps to accomplish this:

  1. Upload files for retrieval
  2. Create a retreival Assistant
  3. Create a thread & add messages to it
  4. Run the assistant
  5. Display the response

Step 1: Upload files for retrieval

Next up, let's upload a PDF to OpenAI with the purpose set to assistants. For this example, we'll of course use the classic Attention Is All You Need paper:

# Upload a file with an "assistants" purpose
file = client.files.create(
  file=open("/content/attention.pdf", "rb"),
  purpose='assistants'
)

If we check the files section of the OpenAI platform, you can find your uploaded files listed there:

Step 2: Create a Retrieval Assistant

First off, let's create a new assistant with the following simple instructions for retrieval. We'll also need to pass retrieval in the tools parameter:

After uploaded the files, we can then add the file to our previously created assistant as follows:

# Add the file to the assistant
assistant = client.beta.assistants.create(
  instructions="You are a knowledge retrieval assistant. Use your knowledge base to best respond to users queries.",
  model="gpt-4-1106-preview",
  tools=[{"type": "retrieval"}],
  file_ids=[file.id]
)

Step 3: Create a thread & add messages

Next up, we'll create a new Thread as follows:

thread = client.beta.threads.create()

And then we can add messages and files to our thread, in this case I'll just ask it the summarize the abstract of the paper and pass in the file.id:

message = client.beta.threads.messages.create(
  thread_id=thread.id,
  role="user",
  content="Summarize the abstract of the paper.",
  file_ids=[file.id]
)

Step 4: Run the assistant

Now that we have the context of both the message and file in our Thread, we can rub the Thread with our Assistant as follows:

run = client.beta.threads.runs.create(
  thread_id=thread.id,
  assistant_id=assistant.id,
)

After you run it, it takes a minute or two to for the run lifecycle to complete.

Step 5: Display the response

After the Run status is complete, we can retrieve the responses as follows:

messages = client.beta.threads.messages.list(
  thread_id=thread.id
)

Now, let's access the assistant response containing the abstract summary like this:

assistant_response = messages.data[0].content[0].text.value

Which we can see, returns:

The abstract of the paper introduces the Transformer, a novel network architecture designed for sequence transduction tasks that is based solely on attention mechanisms and does not rely on recurrent or convolutional neural networks...

Note that we can also add annotations to these responses, although we'll cover that in a future article.

Summary: Getting Started with the Assistants API

In this guide, we looked at two of the built-in tools that Assistants can use: Code Interpreter and Knowledge Retrieval. Already I can see retrieval augmented generation (RAG) is going to be significantly easier with this built in embeddings & vector search capability.

Of course, this is a starter tutorial so it's important to note a few things:

  • Code Interpreter & Retrieval: We can also combine these two tools to create code interpreter on specific data, for example a data visualization assistant for your CSV data.
  • Function Calling: Assistants also have access to function calling, which can be used to connect the Assistant to external APIs or our own functions. This is a bigger topic, so we'll cover that in a dedicated article soon.

These next few weeks and months will certainly be interesting to watch as AI agents start to proliferate into our everyday lives.

As Sam Altman said about AI agents at DevDay...

The upsides of this are going to be tremendous.
You can access the premium version of this tutorial with a video walkthrough and code below.
Getting Started with the Assistants API: MLQ Academy
In this video tutorial, we’ll walk through how to get started with OpenAI’s Assistants API.