Getting Started with GPT-4 Vision for Data Analysis

One of the many releases at OpenAI's Dev Day was the introduction of GPT-4 Vision or GPT-4V. As the documentation highlights:

Historically, language model systems have been limited by taking in a single input modality, text.

GPT-4V is a multimodal model that can take in both images and text and answer questions about them.

While there are many use cases of vision, in this article we'll focus on using it for data analysis, specifically analyzing and interpreting charts and graphs.

For this tool, we'll build a simple fronted using Streamlit that allows users to input either URLs of images of upload one or more images for analysis.

For example, let's go and upload this chart from Our World in Data on Computation used to train notable artificial intelligence systems and ask it to explain it for us.

Source

...which gives us the following interpretation:

💡
This image depicts a scatter plot that illustrates the amount of computation, measured in petaFLOPs (floating-point operations per second multiplied by 10^15), that has been used to train various notable artificial intelligence (AI) systems over time. The x-axis represents the publication date of the systems, spanning from July 2, 1950, to a time after the Executive Order on AI, which was issued by President Biden on October 30, 2023. The y-axis represents the training computation in petaFLOPs on a logarithmic scale.

Various AI systems are plotted as points across this graph, and they are color-coded based on the task domain they address, such as language, games, vision, etc. Some notable systems like "GPT-4" and "AlphaGo" are placed at the higher end of the computation scale, while earlier systems like "Perceptron Mark I" and "Pandemonium" are at the lower end.

A dashed line near the top of the graph indicates that disclosure of AI computation is mandatory at or above 100 billion petaFLOPs under the Executive Order on AI. This refers to a directive issued by the President of the United States for establishing guidelines and standards for the responsible development and use of AI technology, which includes reporting on the computational resources used for training these AI models.


Setting Up the Environment

Now that we know what we're building, let's create a new Python Repl at Replit to host our Streamlit app.

  • We start by importing streamlit, openai, base64, and os. Streamlit is used for the UI, OpenAI for accessing GPT-4 Vision, base64 for image handling, and os for environment variables.
  • Next, you'll need to add your OPENAI_API_KEY to Secrets and setup the OpenAI client as follows:
import streamlit as st
from openai import OpenAI
import base64
import os

OPENAI_API_KEY = os.environ['OPENAI_API_KEY']
client = OpenAI(api_key=OPENAI_API_KEY)

Encoding Images to Base64

Next, when users upload images directly, we need to convert these images into a format that GPT-4 can understand, so let's write a function to encode our images with Base64, which is a common method for encoding binary data as text.

def encode_image(uploaded_file):
  return base64.b64encode(uploaded_file.getvalue()).decode('utf-8')

Analyzing Images with GPT-4 Vision

Next, let's create a function to analyze images using GPT-4 vision:

  • The analyze_image function processes a list of images and a user's question, sending them to OpenAI's GPT-4 Vision model for analysis.
  • It handles both URL-based and base64-encoded images, constructing the appropriate request format for the API.
  • The function then sends this request and returns the model's response as output.
def analyze_image(image_data_list, question, is_url=False):
  messages = [{"role": "user", "content": [{"type": "text", "text": question}]}]
  
  for image_data in image_data_list:
    if is_url:
      messages[0]["content"].append({"type": "image_url", "image_url": {"url": image_data}})
    else:
      messages[0]["content"].append({"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}})

  response = client.chat.completions.create(model="gpt-4-vision-preview", messages=messages)
  return response.choices[0].message.content

Streamlit App Configuration

Now that we've got our functions to handle image inputs and send them to GPT-4V, let's setup our simple Streamlit app:

  • st.set_page_config: Sets the page title and st.title displays the title "GPT-4 Vision for Data Analysis" at the top of the app.
st.set_page_config(page_title="GPT-4 Vision for Data Analysis", page_icon="🔍")
st.title('GPT-4 Vision for Data Analysis')

User Input Handling

Next, let's create a section for users to choose their input method, i.e. either via upload or image URLs. We then want to allow users to specify a question, or just use a default "explain this image" prompt:

  • st.radio: Creates a radio button for the user to choose the image input method.
  • st.text_input: Provides a text input field for users to enter their question for the image.
# User Inputs
image_input_method = st.radio("Select Image Input Method",
                              ('Upload Image', 'Enter Image URL'))
user_question = st.text_input("Enter your question for the image",
                              value="Explain this image")

Image Input Processing

Next, the following section handles user-uploaded images or URLs for analysis, encodes them if necessary, and displays the analysis results in the Streamlit app.

  • Initializes an empty list image_data_list to store image data.
  • If 'Upload Image' is selected:
    • st.file_uploader: Allows users to upload images in JPG, JPEG, or PNG format. Multiple files can be selected.
    • For each uploaded file, the file is encoded to base64 format and added to image_data_list.
    • st.button: When clicked, triggers the analysis of uploaded images.
    • The insights from analyze_image are displayed using st.write.
image_data_list = []

if image_input_method == 'Upload Image':
  uploaded_files = st.file_uploader("Choose images...", type=["jpg", "jpeg", "png"], accept_multiple_files=True)
  if uploaded_files:
    for uploaded_file in uploaded_files:
      image_data_list.append(encode_image(uploaded_file))
    if st.button('Analyze image(s)'):
      insights = analyze_image(image_data_list, user_question)
      st.write(insights)
elif image_input_method == 'Enter Image URL':
  image_urls = st.text_area("Enter the URLs of the images, one per line")
  if image_urls and st.button('Analyze image URL(s)'):
    url_list = image_urls.split('\n')
    insights = analyze_image(url_list, user_question, is_url=True)
    st.write(insights)

That's all we need. Let's test it out with one more image from Our World in Data: Annual global corporate investment in artificial intelligence, by type.

Source
💡
This is a bar chart visualizing the annual global corporate investment in artificial intelligence (AI) by type, from 2013 to 2022. The data are presented in U.S. dollars and have been adjusted for inflation to 2021 dollars. This adjustment is based on the U.S. Consumer Price Index (CPI).

The chart is broken down into four types of investments:

1. Merger/Acquisition: represented by red on the top of some columns, this indicates the amount of money invested in AI through mergers and acquisitions.

2. Public Offering: shown in blue, this includes investments made through public offerings, possibly referring to investments made during IPOs or subsequent public investments into AI companies.

3. Private Investment: shown in green, this represents private funding rounds such as venture capital, private equity, or other types of private placement.

4.Minority Stake: represented in orange, this signifies investments made by purchasing a minority stake in AI companies.

The vertical axis indicates the amount of investment in billions of dollars, while the horizontal axis represents the year. There's a clear trend of increasing investment in AI over the years, with a significant jump from 2020 to 2021 and a slight decrease from 2021 to 2022. In 2022, the investments are spread across all four types, with private investment appearing to comprise the largest share.

Nice.

Summary: GPT-4 Vision for Data Analysis

As we saw with this example, OpenAI's GPT-4 vision provides a powerful new tool for quickly retrieving insights from data visualizations. Given the fact that we can analyze multiple images at once, I can imagine this model will be very useful for businesses in the coming years.

For example, a few use cases that come to mind include:

  1. Market Trend Analysis: For identifying and acting on emerging market patterns.
  2. Financial Reporting Interpretation: Allowing investors to quickly understand company performance and risks.
  3. Consumer Behavior Insights: Assisting in tailoring marketing strategies and product development.

The list goes on, but you get the idea.

If you'd like to access the code for this tutorial, you can sign up to MLQ Academy below.