6 ways to use ChatGPT vision to make your day easier

20 July 2024

0

OpenAI launched ChatGPT in 2022, revolutionizing the world of technology. ChatGPT is a conversationalist AI used as a chatbot and virtual assistant within the web and plug-in API for many applications. You send a prompt, and ChatGPT responds to it; you can ask it anything, like the top book series of the month, or ask it to create a rap song using your favorite Marvel characters.

The usage of LLM-based (large language models) AIs has also been at the center of many topics since its debut. However, it has also enabled new tech to evolve and software to flourish on many budget smartphones and flagship devices. ChatGPT’s arrival has awakened other top tech giants to bring their LLM-based AIs and tools to the public. One crucial feature is the ability to transcribe images and have them perceive them as an option over text. Previously, this feature was reserved for premium users, but OpenAI has rolled it into its latest GPT update.

Large language models (LLMs) are the basis for AI chatbots and much more. Here’s what’s going on behind the scenes

What is ChatGPT vision?

If you’re familiar with generative AI, you likely already have heard of ChatGPT. ChatGPT began in 2022 with the public release of GPT-3.5 and later brought its experimental paid version, GPT-4. According to an OpenAI paper published in 2023, GPT-4V, the feature “enables users to instruct GPT-4 to analyze image inputs provided by the user.” OpenAI completed the training for GPT-4V in March 2022.

figure 2 from openAI paper published on GPT-V in 2023

Source: OpenAI

GPT-4V underwent many iterations before the feature became publicly ready. It was tested and analyzed for disinformation risks, stereotyping, and ungrounded interferences. The developers did not want the vision feature to be misused or provide misinformation regarding safety and sensitive topics.

How can you access ChatGPT vision?

ChatGPT vision, also known as GPT-4 with vision (GPT-4V), was initially rolled out as a premium feature for ChatGPT Plus users ($20 per month). OpenAI has brought its vision feature to all free users with GPT-4o (called Omni). But it is currently being released in batches.

There’s a limit usage for free users, but Plus users will have five times over the limit applied over the free tier. Also, to access ChatGPT, users were previously required to sign up for a free account. Since then, OpenAI has changed its policy; anyone can begin using ChatGPT without creating an account. However, having an account still adds benefits. Benefits include saving and reviewing chat history and attaching images. So, if you plan to use the vision feature, it is advised to sign up for an account.

How to use ChatGPT vision

To get started with GPT-4o, log into chat.openai.com or open the mobile app and select Try it now when prompted.

red rectangle outline over try it now option in introducing GPT-4o window

From there, you can attach an image from your computer or copy an image address from one you’ve found. ChatGPT will invite you to ask questions or directly ask while adding an image.

ChatGPT is not perfect; it makes many mistakes. In the prompt below, with three anime characters placed in an image (image credits: Screenrant), ChatGPT incorrectly guessed one of the three, meaning the answer was only 66% correct.

It guessed Naruto, Goku, and Luffy. But in this image, Luffy isn’t present. Instead, we have Sailor Moon.

sample prompt of using an image to find information related to the image in chatgpt

Even if the feature isn’t perfect, you can still use it for a handful of applications related to images. You can ask ChatGPT to tell you details (make educated guesses) you can only see from a photo. Below, we’ve tried some prompts to see how well ChatGPT can process these requests.

Using GPT-4o vision for learning recipes

We sent this image to ChatGPT-4o and asked if it could discern the recipe (ingredients used) and calorie information based on the image.

close up of a taco salad in a bowl mixed

Source: Food.com

ChatGPT could discern that this was a taco salad and mentioned the typical ingredients. It also broke down the calories based on the ingredients used. The response was:

Calories: 655
Ground beef
Lettuce
Cherry tomatoes
Shredded cheese
Tortilla chips or Doritos
Black beans or pinto beans
Salsa or a similar dressing

The actual answer, according to a user on Food.com:

Calories: 855.3
Ground beef
Taco seasoning
Iceberg lettuce, chopped
Roma tomatoes, diced
Green onions, chopped
Red kidney beans or black beans drained
Large black olives, sliced
Cheddar cheese, shredded
Catalina dressing
Plain Doritos, crumbled into big chunks

Though the answer for ingredients was more generalized than expected, it still provided a rough idea of what the item was and the expected calorie count. Calories will change depending on sauce and portion size, which is difficult to guess from a photo.

Using GPT-4o vision to transcribe handwritten notes into text

Transcribing written notes takes a lot of time, especially when you wish to keep copies digitally. One cool feature of ChatGPT’s vision is asking the AI to rewrite handwritten text images into typed notes.

We asked ChatGPT to send the text version of a slide:

a slide of handwritten notes for chemistry

ChatGPT’s answer:

transcribed handwritten notes into text form in chatgpt

The results were impressive, even detecting handwritten symbols. The AI recognized symbols outside English, which was the case when writing the net charge.

Using GPT-4o vision to solve Captchas

Captchas help filter out bots by creating distorted and difficult-to-discern images, usually filled with letters and numbers. However, solving the Captcha can sometimes prove tricky. We tested whether you can receive help from ChatGPT to solve one.

We pulled an example of a Captcha on Cloudflare’s learning page.

an example of a captcha showing eight characters

Source: Cloudflare

We asked ChatGPT if it could provide the characters in the image (without mentioning it has letters and numbers). The results were not accurate. ChatGPT answered “v6T9JBCD.” The AI thinking the letter “v” was in the image is understandable since the squiggles in the image have a “v” shape, but it was surprising that the letter “S” was not considered at all.

What else can you do with GPT vision?

Uploading images and asking ChatGPT to interpret, analyze, and answer your questions is only one part of its capabilities. You can also ask the AI to produce images based on descriptions and specified instructions. For example, you can take a screenshot of an image and ask how it should look or ask ChatGPT to produce an image from scratch with Dall-E 3.

ChatGPT’s vision feature interprets a mixture of imagery sets, too. Often, we don’t have perfect pictures, and some images contain both text and illustrations. You can use ChatGPT to interpret an infographic and ask it questions. Or even ask it to reproduce it so you can understand it better.

It can also help your day-to-day life; you can take a picture or a video, upload it to the AI, and ask for help. It becomes handy when operating an object and the instructions are in another language.

ChatGPT with vision is still learning

AI can only improve as we feed it more visual data. The more images and questions we ask, the better the AI interprets them with realism and consistency. This is similar to training a human brain: the more we expose ourselves to different topics, the more (and better equipped) we become to handle them. You can apply these principles to machine learning.

In the May 2024 update, OpenAI explains its plans with ChatGPT’s visual learning. Eventually, they want users to be able to converse with the AI using real-time videos and improve its Voice Mode function so you can talk directly to AI more naturally. If AI continues to interest you, you can try some impressive apps on the Google Play Store.

Cut through the clutter to find the best AI apps for your Android.

6 ways to use ChatGPT vision to make your day easier

What are large language models?

What is ChatGPT vision?

How can you access ChatGPT vision?

How to use ChatGPT vision

Using GPT-4o vision for learning recipes

Using GPT-4o vision to transcribe handwritten notes into text

Using GPT-4o vision to solve Captchas

What else can you do with GPT vision?

ChatGPT with vision is still learning

The 5 best AI apps for your Android phone or tablet

Samsung offers free screen replacements for users still suffering green line issues

Leaked Samsung survey may have just tipped the Galaxy S25’s release date

PwnageTool and QuickPwn for 2.1 Firmware

LEAVE A REPLY Cancel reply

Most Popular

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Interview With Willem Dewulf – CEO of ProBackup by Shauli Zacks

Recent Comments

EDITOR PICKS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR POSTS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR CATEGORY

ABOUT US

FOLLOW US