A Study on LLMs for Financial Statement Analysis

In the world of investment research, accurate analysis of financial statements is crucial for making informed decisions.

Traditionally, this task has been performed by human financial analysts, but recent advancements in AI and large language models (LLMs) have opened up new possibilities.

To explore the feasibility of AI for financial analysis, a study called Financial Statement Analysis with Large Language Models was conducted by researchers at the University of Chicago to see whether large language models (LLMs) like GPT-4 can match or even surpass human analysts.

Even without any narrative or industry-specific information, the LLM outperforms financial analysts in its ability to predict earnings changes

In this article, we'll discuss and summarize the key takeaways, methodologies, and implications of this study on LLMs for financial statement analysis.

Key Takeaways

  • GPT-4 outperforms human analysts in predicting the direction of future earnings changes.
  • The AI model's performance is on par with specialized machine learning models trained specifically for earnings prediction.
  • GPT-4's success stems from its ability to generate insightful narratives based on financial data, not from memorization.
  • Trading strategies based on GPT-4's predictions yield higher Sharpe ratios and alphas compared to other models.
  • The study suggests that LLMs could play a more central role in financial decision-making than previously thought.

Methodology

The researchers designed the following approach to test GPT-4's capabilities in financial statement analysis:

  • They provided GPT-4 with anonymized and standardized balance sheets and income statements from a large sample of companies.
  • The AI was instructed to analyze these statements and predict whether a company's earnings would increase or decrease in the following period.
  • A "chain-of-thought" prompt was used to guide GPT-4 through a step-by-step analysis process, mimicking the approach of human analysts.
  • The model's performance was compared to that of human analysts, as well as specialized machine learning models like artificial neural networks (ANNs).
  • The researchers also conducted tests to ensure that GPT-4's predictions were not based on memorization of company-specific information.

Findings

In short, the study's results were quite remarkable:

  • Prediction Accuracy: GPT-4 achieved an accuracy of 60.35% in predicting the direction of future earnings changes, significantly outperforming human analysts (52.71% accuracy) and matching specialized ANNs (60.45% accuracy).
  • Complementary Insights: While GPT-4 outperformed human analysts overall, AI and humans showed complementary strengths. GPT-4 was particularly effective in situations where human analysts typically struggle, such as when dealing with smaller companies or those reporting losses.
  • Narrative Insights: The researchers found that GPT-4's success was largely due to its ability to generate meaningful narrative insights from the financial data. These narratives, when analyzed separately, contained significant predictive power for future earnings.
  • Trading Strategy Performance: Investment strategies based on GPT-4's predictions yielded impressive results, with higher Sharpe ratios and alphas compared to strategies based on other models or human analyst predictions.
Our results suggest that GPT shows a remarkable aptitude for financial statement analysis and achieves state-of-the-art performance without any specialized training.

Here is a chart to illustrate the performance comparison between GPT-4 and other prediction methods. Note this chart was developed by MLQ.ai and not the paper:

This chart compares the accuracy and F1 scores of different prediction methods, clearly showing GPT-4's superior performance when using the chain-of-thought (CoT) approach. Specifically, the chart shows:

  • GPT with CoT achieves the highest accuracy (60.35%) and F1 score (60.90%).
  • Human analysts (Analyst 1m and Analyst 6m) perform better than random guessing but fall short of GPT's performance.
  • GPT without CoT performs similarly to human analysts, highlighting the importance of the chain-of-thought approach.

Implications

The study's findings have significant implications for financial analysis and the investment industry:

  • AI in Financial Analysis: LLMs like GPT-4 could become valuable tools for investors and financial institutions, potentially democratizing access to high-quality financial analysis.
  • Human-AI Collaboration: Rather than replacing human analysts, the results suggest that AI could complement human expertise, especially in areas where humans typically struggle.
  • Improved Decision-Making: The superior performance of trading strategies based on GPT-4's predictions indicates that AI could lead to more informed and potentially more profitable investment decisions.
  • Broader AI Capabilities: The study demonstrates that LLMs can excel in complex quantitative tasks outside their primary language-based domain, hinting at the emergence of more general artificial intelligence.

Summary: LLMs for Financial Analysis

As discussed, this research demonstrates that large language models like GPT-4 have the potential to revolutionize financial statement analysis.

By matching or exceeding the performance of both human analysts and specialized machine learning models, GPT-4 showcases the growing capabilities of AI in complex, real-world tasks.

While the study's results are impressive, it's important to note that the researchers emphasize the complementary nature of AI and human expertise rather than suggesting outright replacement of human analysts.

What's more, given the speed at which LLMs are improving, it's likely we'll see an increasing integration of these tools into financial analysis and the investment research process.

At MLQ.ai, we've been working on exactly this topic with tools such as our SEC filing assistant, earnings call assistant, and more coming soon.

Resources