Building an SEC Filings Assistant with GPT-4 Turbo

With the release of GPT-4 Turbo and the Assistants API, OpenAI has opened up the door for many more AI agent-focused applications.

Specifically, with the Assistants API we can combine knowledge retrieval, code interpreter, and function calling to build specialized autonomous AI agents.

Also, given the fact that GPT-4 Turbo has a 128K context window, or roughly a 300 page book, this opens the door for many more LLM-enabled applications for working with lengthy documents.

In this guide, we'll look at one such application of AI agents: building an SEC filings assistant.

Many investors rely of SEC filings to analyze the financial health of a company, and they can certainly be a treasure trove of valuable information, but there's no doubt that navigating through can be overwhelming.

That's where our SEC Filing Assistant comes into play.

With this assistant, the goal will be to query, summarize, and analyze data from SEC filings in seconds.

Whether you're looking for insights into a company's financial statements, specific information hidden in the footnotes, or any other details, this assistant can fetch the relevant information efficiently and (from what I've seen) quite reliably.


Overview of the SEC Filing Assistant

Before getting into the code, let's first quickly review the tools we'll be using to build this:

  • GPT-4 Turbo: The latest GPT-4 release, which as mentioned has a 128K context window or ~ 300 pages of text in a single prompt.
  • OpenAI's Assistants API: As the docs highlight:
An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries. The Assistants API currently supports three types of tools: Code Interpreter, Retrieval, and Function calling.
  • SEC API: We'll also be dynamically retrieving SEC filings in PDF format from this SEC filings API, which will then be fed into the Assistants API.

Step 1: Imports & API Keys

First, we need to ensure that our environment is properly set up with all the necessary API keys. Since we're using Colab, we can store our API keys in the Secret manager and import the necessary libraries as follows:

# To do: Add OpenAI & SEC API key to Colab Secrets
import os
from google.colab import userdata

# Retrieving the API key from Secret Manager
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
SEC_API_KEY = userdata.get('SEC_API_KEY')


# Setting the API key as an environment variable
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
os.environ["SEC_API_KEY"] = SEC_API_KEY

import json
import requests
from openai import OpenAI
import time
from sec_api import QueryApi

client = OpenAI()

Step 2: Fetching SEC Filings

Now, before we build our assistant, we need to retrieve SEC filings based on the user's input. At a later stage, we could use function calling to convert natural language into the appropriate SEC filing retrieval, although for now it's easy enough to just input the ticker, filing type, and year.

To retrieve SEC filings from the API, we construct a query that includes the ticker symbol, form type, and the year. The function sec_query_pdf_url takes these parameters and returns a URL to the PDF document of the filing.

  • Initializing API: Sets up the SEC API query tool with the provided API key.
  • Date Range Setup: If a quarter is specified, the function calculates start and end dates for that quarter. Otherwise, it sets the range for the entire year.
  • Building the Query: Constructs a query to search for filings that match the ticker, form type, and date range. It sorts results to get the most recent filing.
  • Executing the Query: Sends the query to the SEC API and checks if any filings were returned.
  • URL Retrieval: If filings are found, extracts the URL for the filing details and converts it to a direct PDF URL.
  • Return Value: The function returns the direct URL to the PDF document of the filing.
def sec_query_pdf_url(ticker, form_type, year, quarter=None):
    query_api = QueryApi(api_key=SEC_API_KEY)

    # Construct the date range for the query
    if quarter:
        start_date = f"{year}-{'01 04 07 10'.split()[quarter-1]}-01"
        end_date = f"{year}-{'03 06 09 12'.split()[quarter-1]}-{'31 30 30 31'.split()[quarter-1]}"
    else:
        start_date = f"{year}-01-01"
        end_date = f"{year}-12-31"

    # Construct the query
    query = {
        "query": { "query_string": { "query": f"ticker:{ticker} AND formType:\"{form_type}\" AND filedAt:[{start_date} TO {end_date}]" } },
        "from": "0",
        "size": "1",
        "sort": [{ "filedAt": { "order": "desc" } }]
    }

    # Get the filings
    filings = query_api.get_filings(query)

    if not filings['filings']:
        return "No filings found for the specified criteria."

    # Extract the URL of the first filing
    filing = filings['filings'][0]
    url = filing['linkToFilingDetails']

    # Convert to PDF URL
    pdf_url = f"https://api.sec-api.io/filing-reader?type=pdf&token={SEC_API_KEY}&url={url}"

    return pdf_url

Step 3: Creating the Assistant

At a high level, in order to create an Assistant we follow these steps:

  • Create an Assistant in the API by defining its custom instructions, picking a model, and enabling tools (i.e. Retrieval in this case)
  • Create a Thread when a user starts a conversation.
  • Add Messages to the Thread as the user ask questions.
  • Run the Assistant on the Thread to trigger responses. This automatically calls the relevant tools.

Next, let's define the run_assistant function, which is the main function and takes in the user_message, ticker, form_type, year, thread_id, file_id, and assistant_id

def run_assistant(user_message, ticker, form_type, year, thread_id=None, file_id=None, assistant_id=None):
    # ...

3.1: Check for an Existing Thread

Next, we want to allow users to ask follow up questions without having to re-upload the SEC filing to the Assistant. As such, we'll first check if there is an existing thread_id (i.e. is there an existing conversation?) and if not, this indicates it's a new conversation and we'll need to create a new thread.

if not thread_id:

3.2: Fetch SEC Filing URL

Next, if there is no thread ID, this means we need to go and fetch the SEC filing so we can upload it to the Assistant.

  • Here we'll call the query_and_pdf_url function to retrieve the SEC filing's PDF URL. If no filing is found, print an error and exit the function.
    pdf_url = query_and_get_pdf_url(ticker, form_type, year)
    if not pdf_url or "No filings found" in pdf_url:
        print("Filing not found." if not pdf_url else pdf_url)
        return

3.3: Download and Save the SEC Filing

Next, with the pdf_url we want to download the filing using a GET request and save it locally as filing.pdf:

    response = requests.get(pdf_url)
    with open('filing.pdf', 'wb') as f:
        f.write(response.content)

3.4: Upload the PDF for Assistant Use

Next, we want to upload the downloaded PDF to OpenAI and assign the purpose as assistants, which assigns a file_id for future reference.

    file = client.files.create(
        file=open("filing.pdf", "rb"),
        purpose='assistants'
    )
    file_id = file.id

3.5: Create an Assistant

Now we're ready to create our Assistant with client.beta.assistants.create:

  • Here we provide the instructions i.e. You are a SEC filings assistant. Answer queries based on the provided filings. (we can improve this later with some prompt engineering)
  • We define the model as GPT-4 Turbo gpt-4-1106-preview
  • We enable the Retrieval tool
  • We pass in our file_id from the previous step (i.e. our current SEC filing)
    assistant = client.beta.assistants.create(
        instructions="You are a SEC filings assistant. Answer queries based on the provided filings.",
        model="gpt-4-1106-preview",
        tools=[{"type": "retrieval"}],
        file_ids=[file_id]
    )
    assistant_id = assistant.id

3.6: Create a New Thread

Next, we want to create a new Thread for our Asisstant, which is:

A conversation session between an Assistant and a user. Threads store Messages and automatically handle truncation to fit content into a model’s context.
    thread = client.beta.threads.create()
    thread_id = thread.id

Also, if a thread already exists we'll retrieve the existing Assistant with the previously uploaded SEC filing:

else:
    assistant = client.beta.assistants.retrieve(assistant_id=assistant_id)

3.7: Send the User Message

Next, we want to add the user_message to the thread along with the thread_id, and file_id:

client.beta.threads.messages.create(
    thread_id=thread_id,
    role="user",
    content=user_message,
    file_ids=[file_id] if file_id else []
)

3.8: Create and Start a Run

Next, we need to create a Run, which is:

An invocation of an Assistant on a Thread. The Assistant uses it’s configuration and the Thread’s Messages to perform tasks by calling models and tools. As part of a Run, the Assistant appends Messages to the Thread.
run = client.beta.threads.runs.create(thread_id=thread_id, assistant_id=assistant_id)

3.9: Monitor the Run Status

Next, we'll write a for loop to monitor for the run.status in order to see when it's completed:

while True:
    run = client.beta.threads.runs.retrieve(thread_id=thread_id, run_id=run.id)
    if run.status in ["completed", "failed"]:
        break
    time.sleep(5)

3.10: Retrieve and Display the Messages

Once the Run is completed we'll list retrieve and display the messages as follows:

messages = client.beta.threads.messages.list(thread_id=thread_id)
for message in messages.data:
    print(f"{message.role.title()}: {message.content[0].text.value}")

3.11: Return Identifiers for Future Use

Finally, we'll just return the thread_id, assistant_id, and file_id for any potential follow-up queries.

return thread_id, assistant_id, file_id

Step 4: Testing the SEC Assistant

Next, let's go and test the SEC assistant by calling the run_assistant function and passing in the following parameters:

  • We'll look at Apple's 2022 10-K
  • We'll ask a simple question like What was the amount of the dividend the Company declared in 2022?
  • We'll then ask a follow up question about what program authorization increase to?
# Ask an initial question
thread_id, assistant_id, file_id = run_assistant(
    "What was the amount of the dividend the Company declared in 2022?", 
    "AAPL", "10-K", 2022
)

# Asking a follow up
print(run_assistant(
    "What did program authorization increase to?", 
    "AAPL", "10-K", 2022, 
    thread_id=thread_id, 
    file_id=file_id, 
    assistant_id=assistant_id
))

Success! Comparing this with the SEC filing, we can see it was able to correctly answer both questions.

You'll notice the answer also provides "【9†source】", which should contain the annotations that could be used to cite the SEC filing, although at the moment there's a bug where annotations are always returning empty, so we'll just have to wait for OpenAI to fix that.

Summary: SEC Filings Assistant

To recap, when a user asks a question, our SEC filings assistant now works as follows:

  1. Queries the SEC API for the relevant filing.
  2. Downloads the filing as a PDF.
  3. Uploads the PDF to OpenAI, which is then used as a reference document by the language model.
  4. Sends the user's query to the model, which analyzes the document and provides a response.

By leveraging the power of the Assistants API, GPT-4 Turbo, and the SEC API, we've built a tool that can make financial research significantly more efficient.

This SEC Filing Assistant is just one example of how language models can be used to create practical, domain-specific AI agents.

We'll also be incorporating the SEC Filings Assistant into the MLQ app shortly, so stay tuned for that.

💡
If you want to access the full code for this tutorial, you can join the MLQ Academy and find it below:

Access the full code for this tutorial