This week in AI we take a look at the latest upgrade to ChatGPT, which can now identify and analyze images.
A couple of weeks ago we wrote about the release of ChatGPT Vision, the latest iteration of the LLM from OpenAI. Well, that new release came earlier this week and AI fans and developers around the world are testing its capabilities.
So what have early users been using GPT-4V for?
Not only does GPT-4V recognize images but it also accepts voice prompts as well. Adding these additional 'senses' broadens the use case for ChatGPT immensely and overcomes previous accessibility barriers like language and literacy.
Here are some of the top uses case for GPT-4V that have been documented so far.
Frontend Development
ChatGPT could always produce code from well-designed prompt inputs but now you can even provide it with an image of a website and it will write you the code for it. During one of OpenAI's developer calls, GPT-4V was shown an image of a website which it then provided the front-end code for, including the requested functionality.
Now, users have been able to access this feature too and are already producing some incredible demos on X:
Image Recognition
The most basic, but likely the most broadly used use case is simple image recognition. Uploading photos of fictional characters has proven to be a popular game so far. OpenAI has made it clear that the platform will not accept photos of real-life people that users wish to be identified.
You can also use it to identify random objects or products you see in the world. Wondering what a certain type of flower is? Or a type of sneaker? Or a landmark in a foreign country? Just snap a photo and upload it to ChatGPT! It can even analyze medical imaging like X-rays and CT Scans and interpret them for medical conditions.
Translating Foreign Languages
If you've ever travelled in a foreign country, you know how difficult it can be to read local signs or instructions. Included in GPT-4V's new capabilities is to translate foreign language text from an image. Similar to other apps like Google Translate, GPT-4V can now identify the language of the text and translate it to another language of your choosing.
Data Extraction and Analysis
This is going to be a big one, eventually. With GPT-4V, you can upload an image of a chart or infograph and ChatGPT will provide an interpretation of this data. It isn't yet known how robust the data analytics intelligence is of GPT-4V, but as it evolves it will presumably be able to handle larger datasets.
Object Recognition
Provide an image of a meal and GPT-4V can break it down into ingredients and even provide advice on how to improve or make the meal healthier. If you upload an image of a crowd of people, GPT-4V can calculate how many people are in the picture.
GPT-4V can even use reasoning to determine the purpose of any object within an image within the context of that image. This illustrates a very advanced AI logic and makes it much easier to provide a visual prompt that is difficult to put into words. GPT-4V can even understand the humour and satire of popular memes.
Educational Support
With the addition of vision and voice prompting, GPT-4V can better become a learning tool rather than a one-dimensional LLM that provides the answers. For example, students can upload an image of study material or a text book and ask GPT-4V to explain it to them in simple language. This can also be used for complex mathematical equations, language learning, and literary analysis.
Graphic Design with ChatGPT Vision & DALLE-3
Lastly, combining GPT4-V with DALLE-3 is quite a mind blowing use case. For example, the author below created a logo in minutes by following these steps:
- Find a logo design you like and ask ChatGPT Vision to describe it.
- Use the description step 1 and instruct DALL.E-3 to create an image based on that description.
- Refine the logo with follow-up prompts to perfect it.
Limitations of ChatGPT Vision (GPT-4V)
As with any LLM, there are a number of issues that can still arise. OpenAI cannot guarantee the accuracy of GPT-4V's statements when analyzing data or images. Similarly to ChatGPT, this multi-modal platform can still be inaccurate and even suffer from AI hallucinations.
We also mentioned that OpenAI is not allowing GPT-4V to identify real people in images. The only people that can be identified are fictional characters or well-known figures. It is also trying to limit location identification from images as it could reveal private or sensitive information. Likewise it will not fill out CAPTCHA codes or describe or take part in any illicit or malicious behaviour.
GPT-4V will also not answer questions where the answer could cause harm to the user or another individual. For example, it should not be used to identify if a wild mushroom is safe to eat...