Building an SEC Filings Assistant with GPT-4 Turbo
With the release of GPT-4 Turbo and the Assistants API, OpenAI has opened up the door for many more AI agent-focused applications.
Specifically, with the Assistants API we can combine knowledge retrieval, code interpreter, and function calling to build specialized autonomous AI agents.
Also, given the fact that GPT-4 Turbo has a 128K context window, or roughly a 300 page book, this opens the door for many more LLM-enabled applications for working with lengthy documents.
In this guide, we'll look at one such application of AI agents: building an SEC filings assistant.
Many investors rely of SEC filings to analyze the financial health of a company, and they can certainly be a treasure trove of valuable information, but there's no doubt that navigating through can be overwhelming.
That's where our SEC Filing Assistant comes into play.
With this assistant, the goal will be to query, summarize, and analyze data from SEC filings in seconds.
Whether you're looking for insights into a company's financial statements, specific information hidden in the footnotes, or any other details, this assistant can fetch the relevant information efficiently and (from what I've seen) quite reliably.
Overview of the SEC Filing Assistant
Before getting into the code, let's first quickly review the tools we'll be using to build this:
- GPT-4 Turbo: The latest GPT-4 release, which as mentioned has a 128K context window or ~ 300 pages of text in a single prompt.
- OpenAI's Assistants API: As the docs highlight:
An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries. The Assistants API currently supports three types of tools: Code Interpreter, Retrieval, and Function calling.
- SEC API: We'll also be dynamically retrieving SEC filings in PDF format from this SEC filings API, which will then be fed into the Assistants API.
Step 1: Imports & API Keys
First, we need to ensure that our environment is properly set up with all the necessary API keys. Since we're using Colab, we can store our API keys in the Secret manager and import the necessary libraries as follows:
# To do: Add OpenAI & SEC API key to Colab Secrets
import os
from google.colab import userdata
# Retrieving the API key from Secret Manager
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
SEC_API_KEY = userdata.get('SEC_API_KEY')
# Setting the API key as an environment variable
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
os.environ["SEC_API_KEY"] = SEC_API_KEY
import json
import requests
from openai import OpenAI
import time
from sec_api import QueryApi
client = OpenAI()
Step 2: Fetching SEC Filings
Now, before we build our assistant, we need to retrieve SEC filings based on the user's input. At a later stage, we could use function calling to convert natural language into the appropriate SEC filing retrieval, although for now it's easy enough to just input the ticker, filing type, and year.
To retrieve SEC filings from the API, we construct a query that includes the ticker symbol, form type, and the year. The function sec_query_pdf_url
takes these parameters and returns a URL to the PDF document of the filing.
- Initializing API: Sets up the SEC API query tool with the provided API key.
- Date Range Setup: If a quarter is specified, the function calculates start and end dates for that quarter. Otherwise, it sets the range for the entire year.
- Building the Query: Constructs a query to search for filings that match the ticker, form type, and date range. It sorts results to get the most recent filing.
- Executing the Query: Sends the query to the SEC API and checks if any filings were returned.
- URL Retrieval: If filings are found, extracts the URL for the filing details and converts it to a direct PDF URL.
- Return Value: The function returns the direct URL to the PDF document of the filing.
def sec_query_pdf_url(ticker, form_type, year, quarter=None):
query_api = QueryApi(api_key=SEC_API_KEY)
# Construct the date range for the query
if quarter:
start_date = f"{year}-{'01 04 07 10'.split()[quarter-1]}-01"
end_date = f"{year}-{'03 06 09 12'.split()[quarter-1]}-{'31 30 30 31'.split()[quarter-1]}"
else:
start_date = f"{year}-01-01"
end_date = f"{year}-12-31"
# Construct the query
query = {
"query": { "query_string": { "query": f"ticker:{ticker} AND formType:\"{form_type}\" AND filedAt:[{start_date} TO {end_date}]" } },
"from": "0",
"size": "1",
"sort": [{ "filedAt": { "order": "desc" } }]
}
# Get the filings
filings = query_api.get_filings(query)
if not filings['filings']:
return "No filings found for the specified criteria."
# Extract the URL of the first filing
filing = filings['filings'][0]
url = filing['linkToFilingDetails']
# Convert to PDF URL
pdf_url = f"https://api.sec-api.io/filing-reader?type=pdf&token={SEC_API_KEY}&url={url}"
return pdf_url
Step 3: Creating the Assistant
At a high level, in order to create an Assistant we follow these steps:
- Create an Assistant in the API by defining its custom instructions, picking a model, and enabling tools (i.e. Retrieval in this case)
- Create a Thread when a user starts a conversation.
- Add Messages to the Thread as the user ask questions.
- Run the Assistant on the Thread to trigger responses. This automatically calls the relevant tools.
Next, let's define the run_assistant
function, which is the main function and takes in the user_message
, ticker
, form_type
, year
, thread_id
, file_id
, and assistant_id
def run_assistant(user_message, ticker, form_type, year, thread_id=None, file_id=None, assistant_id=None):
# ...
3.1: Check for an Existing Thread
Next, we want to allow users to ask follow up questions without having to re-upload the SEC filing to the Assistant. As such, we'll first check if there is an existing thread_id
(i.e. is there an existing conversation?) and if not, this indicates it's a new conversation and we'll need to create a new thread.
if not thread_id:
3.2: Fetch SEC Filing URL
Next, if there is no thread ID, this means we need to go and fetch the SEC filing so we can upload it to the Assistant.
- Here we'll call the
query_and_pdf_url
function to retrieve the SEC filing's PDF URL. If no filing is found, print an error and exit the function.
pdf_url = query_and_get_pdf_url(ticker, form_type, year)
if not pdf_url or "No filings found" in pdf_url:
print("Filing not found." if not pdf_url else pdf_url)
return
3.3: Download and Save the SEC Filing
Next, with the pdf_url
we want to download the filing using a GET request and save it locally as filing.pdf
:
response = requests.get(pdf_url)
with open('filing.pdf', 'wb') as f:
f.write(response.content)
3.4: Upload the PDF for Assistant Use
Next, we want to upload the downloaded PDF to OpenAI and assign the purpose as assistants
, which assigns a file_id
for future reference.
file = client.files.create(
file=open("filing.pdf", "rb"),
purpose='assistants'
)
file_id = file.id
3.5: Create an Assistant
Now we're ready to create our Assistant with client.beta.assistants.create
:
- Here we provide the instructions i.e.
You are a SEC filings assistant. Answer queries based on the provided filings.
(we can improve this later with some prompt engineering) - We define the model as GPT-4 Turbo
gpt-4-1106-preview
- We enable the
Retrieval
tool - We pass in our
file_id
from the previous step (i.e. our current SEC filing)
assistant = client.beta.assistants.create(
instructions="You are a SEC filings assistant. Answer queries based on the provided filings.",
model="gpt-4-1106-preview",
tools=[{"type": "retrieval"}],
file_ids=[file_id]
)
assistant_id = assistant.id
3.6: Create a New Thread
Next, we want to create a new Thread for our Asisstant, which is:
A conversation session between an Assistant and a user. Threads store Messages and automatically handle truncation to fit content into a model’s context.
thread = client.beta.threads.create()
thread_id = thread.id
Also, if a thread already exists we'll retrieve the existing Assistant with the previously uploaded SEC filing:
else:
assistant = client.beta.assistants.retrieve(assistant_id=assistant_id)
3.7: Send the User Message
Next, we want to add the user_message
to the thread along with the thread_id
, and file_id
:
client.beta.threads.messages.create(
thread_id=thread_id,
role="user",
content=user_message,
file_ids=[file_id] if file_id else []
)
3.8: Create and Start a Run
Next, we need to create a Run, which is:
An invocation of an Assistant on a Thread. The Assistant uses it’s configuration and the Thread’s Messages to perform tasks by calling models and tools. As part of a Run, the Assistant appends Messages to the Thread.
run = client.beta.threads.runs.create(thread_id=thread_id, assistant_id=assistant_id)
3.9: Monitor the Run Status
Next, we'll write a for loop to monitor for the run.status
in order to see when it's completed:
while True:
run = client.beta.threads.runs.retrieve(thread_id=thread_id, run_id=run.id)
if run.status in ["completed", "failed"]:
break
time.sleep(5)
3.10: Retrieve and Display the Messages
Once the Run is completed we'll list retrieve and display the messages as follows:
messages = client.beta.threads.messages.list(thread_id=thread_id)
for message in messages.data:
print(f"{message.role.title()}: {message.content[0].text.value}")
3.11: Return Identifiers for Future Use
Finally, we'll just return the thread_id
, assistant_id
, and file_id
for any potential follow-up queries.
return thread_id, assistant_id, file_id
Step 4: Testing the SEC Assistant
Next, let's go and test the SEC assistant by calling the run_assistant
function and passing in the following parameters:
- We'll look at Apple's 2022 10-K
- We'll ask a simple question like What was the amount of the dividend the Company declared in 2022?
- We'll then ask a follow up question about what program authorization increase to?
# Ask an initial question
thread_id, assistant_id, file_id = run_assistant(
"What was the amount of the dividend the Company declared in 2022?",
"AAPL", "10-K", 2022
)
# Asking a follow up
print(run_assistant(
"What did program authorization increase to?",
"AAPL", "10-K", 2022,
thread_id=thread_id,
file_id=file_id,
assistant_id=assistant_id
))
Success! Comparing this with the SEC filing, we can see it was able to correctly answer both questions.
You'll notice the answer also provides "【9†source】", which should contain the annotations that could be used to cite the SEC filing, although at the moment there's a bug where annotations are always returning empty, so we'll just have to wait for OpenAI to fix that.
Summary: SEC Filings Assistant
To recap, when a user asks a question, our SEC filings assistant now works as follows:
- Queries the SEC API for the relevant filing.
- Downloads the filing as a PDF.
- Uploads the PDF to OpenAI, which is then used as a reference document by the language model.
- Sends the user's query to the model, which analyzes the document and provides a response.
By leveraging the power of the Assistants API, GPT-4 Turbo, and the SEC API, we've built a tool that can make financial research significantly more efficient.
This SEC Filing Assistant is just one example of how language models can be used to create practical, domain-specific AI agents.
We'll also be incorporating the SEC Filings Assistant into the MLQ app shortly, so stay tuned for that.