GPT-4 for Financial Statements: Building an AI Analyst

In this guide, we discuss how build an AI analyst that uses GPT-4 to analyze financial statements, including income statements, balance sheets, and cash flow of public companies.

2 years ago   •   6 min read

By Peter Foy
Table of contents

In our previous tutorials on GPT-3 and GPT-4, we've looked at several use cases for LLMs in finance such as earnings call summaries, stock screener assistants, and various research assistants.

In this guide, we'll focus on another use case of LLMs for finance: specifically, using GPT-4 to summarize and analyze financial statements.

The financial statement assistant will retrieve statements, write a summary, and provide an insightful analysis of the chosen time period.
💡
Update: We've launched Q, your AI quant analyst.

This AI agent has access to to 10+ years of financial data, company news, financial statements, key metrics, earnings calls, and more.

You can also upload and analyze your own data & documents with prebuilt data science workflows. Try it out for yourself here.
 


We'll walk through how to build a simple Streamlit app where users can:

  • Select the desired financial statement (i.e. income statement, balance sheet, cash flow)
  • Choose the desired statement period (i.e.annual or quarterly)
  • Choose the number of past statements to analyze (i.e. past 4 statements)
  • Input a stock ticker and click run

As you can see, the Streamlit app first returns the raw DataFrame for the financial statements so we can confirm the numbers, then provides a summary of key financial metrics from the statement in a more user-friendly way, and finally provides an analysis of the statements.

While still quite a simple application, I think this idea of using LLMs to summarize raw financial data and provide basic financial analysis, key takeaways, and highlight interesting insights that may not be immediately noticed by the lesser trained financial eye is quite a very powerful idea in finance.

Alright now that we know what we're building, let's jump into the code.

MLQ.ai | AI for data analysis

Step 0: Installs & Imports

Before we dive into the code, first make sure you have installed all necessary libraries with pip:

pip install streamlit openai requests

Next, we'll need to import the following libraries into our app and set our OpenAI and Financial Modeling Prep API key in apikey.py:

import streamlit as st
import openai
import requests
import os
from apikey import OPENAI_API_KEY, FMP_API_KEY
openai.api_key = OPENAI_API_KEY

Step 1: Retrieving Financial Statements

Alright next let's set up a few functions to retrieve financial statements from the Financial Modeling Prep API. These statements can be three types: income statements, balance sheets, and cash flow statements.

Below, we'll write a function that takes in the company's ticker, the number of past time periods to retrieve (limit), the statements frequency (period i.e. annual or quarterly) and the type of statement to retrieve:

def get_financial_statements(ticker, limit, period, statement_type):
    if statement_type == "Income Statement":
        url = f"https://financialmodelingprep.com/api/v3/income-statement/{ticker}?period={period}&limit={limit}&apikey={FMP_API_KEY}"
    elif statement_type == "Balance Sheet":
        url = f"https://financialmodelingprep.com/api/v3/balance-sheet-statement/{ticker}?period={period}&limit={limit}&apikey={FMP_API_KEY}"
    elif statement_type == "Cash Flow":
        url = f"https://financialmodelingprep.com/api/v3/cash-flow-statement/{ticker}?period={period}&limit={limit}&apikey={FMP_API_KEY}"
    
    data = get_jsonparsed_data(url)

    if isinstance(data, list) and data:
        return pd.DataFrame(data)
    else:
        st.error("Unable to fetch financial statements. Please ensure the ticker is correct and try again.")
        return pd.DataFrame()

Here, get_jsonparsed_data(url) is a helper function to send a GET request to the provided url and parse the JSON response.

Step 2: Generate Financial Statements Summary with GPT-4

Next, instead of just display the raw dataframe table, we're also going to first summarize key metrics from the statement.

Raw financial statements are often challenging to comprehend for users without a financial background. To make these statements more accessible and insightful, we'll use GPT-4 to generate a textual summary.

To do so, we'create a function called generate_financial_summary(), which takes as input the DataFrame of financial statements and the statement type. Based on the statement type, we then loop through each statement retrieved and display a few key metrics before the analysis:

def generate_financial_summary(financial_statements, statement_type):
    """
    Generate a summary of financial statements for the statements using GPT-3.5 Turbo or GPT-4.
    """
    
    # Create a summary of key financial metrics for all four periods
    summaries = []
    for i in range(len(financial_statements)):
        if statement_type == "Income Statement":
            summary = f"""
                For the period ending {financial_statements['date'][i]}, the company reported the following:
                ...
                """
        elif statement_type == "Balance Sheet":
            summary = f"""
                For the period ending {financial_statements['date'][i]}, the company reported the following:
                ...
                """
        elif statement_type == "Cash Flow":
            summary = f"""
                For the period ending {financial_statements['date'][i]}, the company reported the following:
                ...
                """
        summaries.append(summary)

Next, we combine all these individual period summaries into a single string:

    # Combine all summaries into a single string
    all_summaries = "\n\n".join(summaries)

After creating the financial statement summaries, we're then going to use GPT-4 to analyze them and provide insights such as the changes in certain metrics over time (i.e. how have R&D expenses changed over the past year).

There's definitely some more prompt engineering we can do that will improve GPT-4's output here, for example by providing a few examples of real-world financial analysis could improve the output via few-shot learning, although this will do for now:

    # Call GPT-4 for analysis
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {
                "role": "system",
                "content": "You are an AI trained to provide financial analysis based on financial statements.",
            },
            {
                "role": "user",
                "content": f"""
                Please analyze the following data and provide insights:\n{all_summaries}.\n 
                Write each section out as instructed in the summary section and then provide analysis of how it's changed over the time period.
                ...
                """
            }
        ]
    )

    return response['choices'][0]['message']['content']

Step 3: Building the Streamlit App

Alright that's all we need for our starter GPT-4 analyst, let's now go and put this together in a Streamlit app. To do so, we'll define a function called financial_statements(), which contains the Streamlit code to enable an interactive UI for the financial assistant:

def financial_statements():
    st.title('GPT-4 & Financial Statements')

    statement_type = st.selectbox("Select financial statement type:", ["Income Statement", "Balance Sheet", "Cash Flow"])

    col1, col2 = st.columns(2)

    with col1:
        period = st.selectbox("Select period:", ["Annual", "Quarterly"]).lower()

    with col2:
        limit = st.number_input("Number of past financial statements to analyze:", min_value=1, max_value=10, value=4)
    

    ticker = st.text_input("Please enter the company ticker:")

    if st.button('Run'):
        if ticker:
            ticker = ticker.upper()
            financial_statements = get_financial_statements(ticker, limit, period, statement_type)

            with st.expander("View Financial Statements"):
                st.dataframe(financial_statements)

            financial_summary = generate_financial_summary(financial_statements, statement_type)

            st.write(f'Summary for {ticker}:\n {financial_summary}\n')

And finally, to run our application we'll create this main function, where I've added a sidebar dropdown for additional assistants that I'll be building in the near future (details to come):

def main():
    st.sidebar.title('AI Financial Analyst')
    app_mode = st.sidebar.selectbox("Choose your AI assistant:",
        ["Financial Statements"])
    if app_mode == 'Financial Statements':
        financial_statements()


if __name__ == '__main__':
    main()

With that, we have everything we need and can run the app locally with streamlit run app.py.

Let's go and test it out again with the past 4 annual balance sheet statements for AAPL:

Not bad.

Summary: GPT-4 for Financial Statements

In this guide, we saw how we can we make raw financial data a bit more human-readable and accessible with GPT-4.

While some may financial wizards may enjoy staring at raw data in spreadsheets, my guess is that the majority of people would prefer to analyze financial data in a more "AI assisted" way that includes summaries of the data, and insightful analysis that brings to light key metrics that may have otherwise been overlooked.

While this is of course just a starting point for a full-fledged AI analyst, over the next few articles we'll explore a few other similar applications of LLMs to financial data. As mentioned, I think there's a massive opportunity to use LLMs and "bring financial data to life" bringing allowing users to to chat with this type of data.

To really achieve that, another next step will be to make this app more chat-oriented and allow users to ask follow-up questions and truly be able to chat with financial data, although we'll save that for another article.

Spread the word

Keep reading