ChatGPT

Practical Use Cases of ChatGPT Vision

In this guide, we look at 8 practical use cases of ChatGPT Vision, otherwise known as GPT-4V.

3 years ago • 5 min read

By Mike Sak

This week in AI we take a look at the latest upgrade to ChatGPT, which can now identify and analyze images.

A couple of weeks ago we wrote about the release of ChatGPT Vision, the latest iteration of the LLM from OpenAI. Well, that new release came earlier this week and AI fans and developers around the world are testing its capabilities.

So what have early users been using GPT-4V for?

Not only does GPT-4V recognize images but it also accepts voice prompts as well. Adding these additional 'senses' broadens the use case for ChatGPT immensely and overcomes previous accessibility barriers like language and literacy.

Here are some of the top uses case for GPT-4V that have been documented so far.

Frontend Development

ChatGPT could always produce code from well-designed prompt inputs but now you can even provide it with an image of a website and it will write you the code for it. During one of OpenAI's developer calls, GPT-4V was shown an image of a website which it then provided the front-end code for, including the requested functionality.

Now, users have been able to access this feature too and are already producing some incredible demos on X:

ChatGPT Vision can take in screenshots from Figma and generate code.

Building with AI is getting wild. pic.twitter.com/D8yeJW1kGR
— Mckay Wrigley (@mckaywrigley) September 29, 2023

I gave ChatGPT a screenshot of a SaaS dashboard and it wrote the code for it.

This is the future. pic.twitter.com/9xFgFdv4MM
— Mckay Wrigley (@mckaywrigley) September 27, 2023

The first GPT-4V-powered frontend engineer agent.

Just upload a picture of a design, and the agent autonomously codes it up, looks at a render for mistakes, improves the code accordingly, repeat.

Utterly insane. pic.twitter.com/qN75vwkbDZ
— Matt Shumer (@mattshumer_) September 29, 2023

Image Recognition

The most basic, but likely the most broadly used use case is simple image recognition. Uploading photos of fictional characters has proven to be a popular game so far. OpenAI has made it clear that the platform will not accept photos of real-life people that users wish to be identified.

You can also use it to identify random objects or products you see in the world. Wondering what a certain type of flower is? Or a type of sneaker? Or a landmark in a foreign country? Just snap a photo and upload it to ChatGPT! It can even analyze medical imaging like X-rays and CT Scans and interpret them for medical conditions.

Yes ChatGPT, I am indeed entertained. pic.twitter.com/XcENUMVcBF
— Peter Yang (@petergyang) September 27, 2023

Translating Foreign Languages

If you've ever travelled in a foreign country, you know how difficult it can be to read local signs or instructions. Included in GPT-4V's new capabilities is to translate foreign language text from an image. Similar to other apps like Google Translate, GPT-4V can now identify the language of the text and translate it to another language of your choosing.

Data Extraction and Analysis

This is going to be a big one, eventually. With GPT-4V, you can upload an image of a chart or infograph and ChatGPT will provide an interpretation of this data. It isn't yet known how robust the data analytics intelligence is of GPT-4V, but as it evolves it will presumably be able to handle larger datasets.

Object Recognition

Provide an image of a meal and GPT-4V can break it down into ingredients and even provide advice on how to improve or make the meal healthier. If you upload an image of a crowd of people, GPT-4V can calculate how many people are in the picture.

GPT-4V can even use reasoning to determine the purpose of any object within an image within the context of that image. This illustrates a very advanced AI logic and makes it much easier to provide a visual prompt that is difficult to put into words. GPT-4V can even understand the humour and satire of popular memes.

I will never get a parking ticket again. pic.twitter.com/yl7ND2rJeQ
— Peter Yang (@petergyang) September 27, 2023

Educational Support

With the addition of vision and voice prompting, GPT-4V can better become a learning tool rather than a one-dimensional LLM that provides the answers. For example, students can upload an image of study material or a text book and ask GPT-4V to explain it to them in simple language. This can also be used for complex mathematical equations, language learning, and literary analysis.

This is a game changer. You can use ChatGPT to transform equations to python functions.

Wish I had this 5 years ago. pic.twitter.com/YL6wzpuH1p
— Lior⚡ (@AlphaSignalAI) October 14, 2023

Graphic Design with ChatGPT Vision & DALLE-3

Lastly, combining GPT4-V with DALLE-3 is quite a mind blowing use case. For example, the author below created a logo in minutes by following these steps:

Find a logo design you like and ask ChatGPT Vision to describe it.
Use the description step 1 and instruct DALL.E-3 to create an image based on that description.
Refine the logo with follow-up prompts to perfect it.

GPT-4V + DALL.E-3 = Premium Logos in Minutes 🤯

Graphic design has officially been disrupted.

Here's how to create a logo you can charge for in 3 steps: 👇

[🔖Bookmark to refer back to later] pic.twitter.com/EE0zXmUggI
— Sebo (@sebo_gm) October 13, 2023

Limitations of ChatGPT Vision (GPT-4V)

As with any LLM, there are a number of issues that can still arise. OpenAI cannot guarantee the accuracy of GPT-4V's statements when analyzing data or images. Similarly to ChatGPT, this multi-modal platform can still be inaccurate and even suffer from AI hallucinations.

We also mentioned that OpenAI is not allowing GPT-4V to identify real people in images. The only people that can be identified are fictional characters or well-known figures. It is also trying to limit location identification from images as it could reveal private or sensitive information. Likewise it will not fill out CAPTCHA codes or describe or take part in any illicit or malicious behaviour.

GPT-4V will also not answer questions where the answer could cause harm to the user or another individual. For example, it should not be used to identify if a wild mushroom is safe to eat...

Tags:
ChatGPT

Premium

100+ Prompts to Learn Large Language Models (LLMs)

public

Frontend Development

Image Recognition

Translating Foreign Languages

Data Extraction and Analysis

Object Recognition

Educational Support

Graphic Design with ChatGPT Vision & DALLE-3

Limitations of ChatGPT Vision (GPT-4V)

Sign up for MLQ.ai

Spread the word

100+ Prompts to Learn Large Language Models (LLMs)

Vector Database Startups to Watch

Keep reading

OpenAI releases GPT-4o: real-time voice assistant

How to Edit DALL·E Images with ChatGPT

OpenAI Launches GPT Store - This Week in AI

Subscribe to our newsletter