The best way to make AI useful is to let it see your screen

16 June 2025

0

The first wave of AI features has been defined by summarization. Apple, Google, and Samsung all offer some version of notes, notifications, or email summaries, in some cases without the need to connect to the internet at all. That makes a good bit of sense: large language models are trained on mountains of text, so it might follow that they’re able to condense that text efficiently.

No Thanks, Keep Reading

If you’ve used any of these features, you know that their quality is mixed at best, but they’re ultimately laying the groundwork for something better. The next big wave of AI tools is focused on not just feeding AI text, but letting it deal with the vast wilderness of things happening on your screen. The various implementations vary, but they all point to the same thing: a contextual AI that can act as a second pair of eyes on whatever you’re doing.

Here’s why it’s the sweet spot for generative AI, and why device makers are at a real advantage when it comes to offering these features to users.

A humanoid robot with large, glowing eyes and a tablet screen, set against a vibrant pink background featuring a network of digital connection lines.

AI is not here to steal our jobs

Posts

A second pair of eyes

Circle to Search, Pixel Screenshots, and Copilot Vision

Circle to Search running on a Google Pixel 9.

It’s not dependent on generative AI, but Google’s Circle to Search feels like the first 2024 example I can think of where letting software see your screen came with advantages that outweighed the costs.

Circle to Search, which debuted on the Samsung Galaxy S24, but is technically a part of Android now, is essentially a specialized version of reverse image search. Long-press your phone’s navigation bar and the screen will freeze, letting you circle anything on your phone you want to learn more about.

That could be a pair of shoes someone’s wearing in a TikTok video, or text on a poster. Circle to Search can pull up information about all of them, helping you find a product you want to buy, defining a term, or translating text you don’t understand.

A Google Pixel Tablet on light wood surface

Just draw a circle to search for anything

Posts

As of December 2024, anything you’ve circled in Circle to Search can also be sent to Pixel Screenshots, a new app introduced alongside the Pixel 9 for cataloging screenshots. It uses AI to sort screenshots into different categories, and as part of a recent update, suggests the content from images as suggestions in GBoard.

Those features, along with the general ability to just ask Gemini questions about what’s on your screen (mostly focused on summarization, unless you’re watching a video), gesture at what’s possible when you give AI a view of what you’re looking at.

A screenshot of an Edge browser window running Copilot Vision.

Microsoft

Microsoft has started to take those basic ideas even further in the Edge browser. It’s new experimental Copilot Vision feature lets you talk to the AI assistant while you’re browsing and answer questions about whatever you’re looking at. The feature is limited and capable of producing errors in much the same way a normal text chat with Copilot can, but it represents what I think might be the sweet spot for these kinds of AI features.

You can ask for basic recommendations that you could probably answer for yourself just by exploring a website more thoroughly, but also make more specific requests, even letting Copilot help you cheat on a round of Geoguesser.

The number of websites you can use Copilot Vision on is deliberately limited for now, which Microsoft says is part of the considerations for security and copyright it’s making, but there are plans to expand. Any data connected to what you actually say during a Copilot Vision session, or the contextual website information connected to those questions and requests, isn’t saved after you’ve turned off Copilot Vision.

It seems like an even more natural way of getting help than Circle to Search or Pixel Screenshots, and I wouldn’t be surprised if it became the norm across all the major AI platforms. Or at least the ones integrated into operating systems or web browsers.

Letting AI view your screen can have drawbacks

Device makers are in a unique position to guarantee users’ safety

A screenshot of Windows Recall search results showing screen grabs that match.

Microsoft / Android Police

The problem with all of these screen sharing features is that people are often looking at things they wouldn’t want to share with Al. That’s why Microsoft insists that Copilot Vision doesn’t remember anything it “sees.” The company was heavily criticized for privacy issues with Windows Recall, which, unlike Pixel Screenshots, captures images of your screen without your input to create a timeline of everything you’ve done on your computer.

There were obvious problems with that idea — an AI shouldn’t capture a screenshot of your bank account or government ID — and Microsoft had to completely overhaul how Recall works and stores screenshots to get it in a position to be actually released.

The problem with all of these screen sharing features is that people are often looking at things they wouldn’t want to share with Al.

Owning an operating system and the hardware it runs on gives you a unique advantage with these kinds of AI features, because you can have precise control of what these models have access to and when. That’s a key element of Apple’s privacy-focused approach to AI on the iPhone, and one of several reasons it hasn’t released an updated version of Siri that can access your phone’s screen and apps.

An AI less focused on general knowledge is good

Source: Pexels

The large language models that power generative AI might be trained on a huge amount of data, but their ability to actually have a deep well of accurate knowledge is not guaranteed. They can come up with incomplete answers just as often as they can lie to a straightforward question.

The strength of AI apps like Google’s NotebookLM is that they make an AI model responsible for answering questions about a much smaller amount of information: whatever sources you upload yourself. Letting AI see your screen feels like the upper limit of that same kind of skill, where the limitation you’re providing is whatever you’re seeing.

It’s wider than a few PDFs or YouTube videos, but it’s much narrower than expecting an AI to be an answer machine for all human knowledge. That seems like the right level for a useful AI to be operating at.

The best way to make AI useful is to let it see your screen

8 common AI myths busted

A second pair of eyes

Circle to Search, Pixel Screenshots, and Copilot Vision

How to use Circle to Search on the Google Pixel Tablet

Letting AI view your screen can have drawbacks

Device makers are in a unique position to guarantee users’ safety

An AI less focused on general knowledge is good

The Google Pixel 10 is great at $799, but it becomes a must-buy at $349

Skip the Pixel and Galaxy Watch, this Garmin is a far better option at its new lower price

Prime Video makes it easier to catch up on shows with new AI Video Recaps

LEAVE A REPLY Cancel reply

Most Popular

The Google Pixel 10 is great at $799, but it becomes a must-buy at $349

Skip the Pixel and Galaxy Watch, this Garmin is a far better option at its new lower price

Prime Video makes it easier to catch up on shows with new AI Video Recaps

YouTube just revived a fan-favorite feature, but it’s not live for everyone yet

EDITOR PICKS

The Google Pixel 10 is great at $799, but it becomes a must-buy at $349

Skip the Pixel and Galaxy Watch, this Garmin is a far better option at its new lower price

Prime Video makes it easier to catch up on shows with new AI Video Recaps

POPULAR POSTS

The Google Pixel 10 is great at $799, but it becomes a must-buy at $349

Skip the Pixel and Galaxy Watch, this Garmin is a far better option at its new lower price

Prime Video makes it easier to catch up on shows with new AI Video Recaps

POPULAR CATEGORY

ABOUT US

FOLLOW US