Wednesday, July 23, 2025
News Wave
No Result
View All Result
  • Home
  • World
  • USA
  • Business
  • Sports
  • Entertainment
  • Technology
Login
News Wave
  • Home
  • World
  • USA
  • Business
  • Sports
  • Entertainment
  • Technology
Login
No Result
View All Result
Login
News Wave
No Result
View All Result
Home Technology

Here’s how Apple could change your iPhone forever

15 April 2024
in Technology
0
Here’s how Apple could change your iPhone forever
Share on FacebookShare on Twitter
How does this make you feel?



Joe Maring / Digital Trends

Over the past few months, Apple has released a steady stream of research papers detailing its work with generative AI. So far, Apple has been tight-lipped about what exactly is cooking in its research labs, while rumors circulate that Apple is in talks with Google to license its Gemini AI for iPhones.

But there have been a couple of teasers of what we can expect. In February, an Apple research paper detailed an open-source model called MLLM-Guided Image Editing (MGIE) that is capable of media editing using natural language instructions from users. Now, another research paper on Ferret UI has sent the AI community into a frenzy.

The idea is to deploy a multimodal AI (one that understands texts as well as multimedia assets) to better understand elements of a mobile user interface. — and most importantly, to deliver actionable tips. That’s a critical goalpost as engineers race to make AI more useful for an average smartphone user than the current “parlor trick” status.

In that direction, the biggest push is to unplug the generative AI capabilities from the cloud, end the need for an internet connection, and deploy every task on-device so that it’s faster and safer. Take, for example, Google’s Gemini, which is running locally on the Google Pixel and Samsung Galaxy S24 series phones – and soon, OnePlus phones – and performing tasks like summarization and translation.

What is Apple’s Ferret UI?

Apple Ferret UI feature cards.
Apple

With Ferret-UI, Apple seemingly aims to blend together the smarts of a multimodal AI model with iOS. Right now, the focus is on more “elementary” chores like “icon recognition, find text, and widget listing.” However, it’s not just about making sense of what is being displayed on an iPhone’s screen, but also understanding it logically and answering contextual queries posed by users through its reasoning capabilities.

The easiest way to describe Ferret UI’s capabilities is as an intelligent optical character recognition (OCR) system powered by AI. “After training on the curated datasets, Ferret-UI exhibits outstanding comprehension of UI screens and the capability to execute open-ended instructions,” notes the research paper. The team behind Ferret UI has tuned it to accommodate “any resolution.”

You can ask questions like “Is this app safe for my 12-year-old kid?” while surfing through the App Store. In such situations, the AI will read the age rating of the app and will accordingly provide the answer. How the answer would be served – text or audio – isn’t specified, as the paper doesn’t mention Siri or any virtual assistant, for that matter.

Apple didn’t fall too far from the GPT tree

Apple Ferret UI overview.
Apple

But the ideas are far more panoramic and smart. Ask it “How can I share the app with a friend?” and the AI will highlight the “share” icon on the screen. Of course, it will give you a gist of what’s flashing on the screen, but at the same time, it will logically analyze the visual assets on the screen — just as boxes, buttons, pictures, icons, and more. That’s a massive accessibility win.

If you’d like to hear the technical terms, well, the paper refers to these capabilities as “perception conversation,” “functional inference,” and “interaction conversation.” One of the research paper’s descriptions actually sums up the Ferret UI possibilities perfectly, describing it as “the first MLLM designed to execute precise referring and grounding tasks specific to UI screens, while adeptly interpreting and acting upon open-ended language instructions.”

Apple Ferret UI answering screen-aware questions.
Apple

As a result, it can describe screenshots, tell what a particular asset does when tapped, and discern whether something on the screen is interactive with touch inputs. Ferret UI is not solely an in-house project. Instead, for the reasoning and description part, it relies on OpenAI’s GPT-4 tech, which powers ChatGPT, along with a whole bunch of other conversational products out there.

Notably, the particular version proposed in the paper is suitable for multiple aspect ratios. In addition to its on-screen analysis and reasoning capabilities, the research paper also describes a few advanced capabilities that are pretty amazing to envision. For example, in the below screenshot, it seems capable of not only analyzing handwritten text, but can also predict the correct version from the user’s misspelled scribble.

Apple Ferret UI recognizing text.
Apple

MIt is also capable of reading text accurately that is cut off at the top or bottom edge and would otherwise require a vertical scroll. However, it’s not perfect. On occasions, it misidentifies a button as a tab and misreads assets that combine images and text into a single block.

When pitted against OpenAI’s GPT-4V model, Ferret UI delivered an impressive level of conversation interaction outputs when asked questions related to the on-screen content. As can be seen in the image below, Ferret UI prefers more concise and straightforward answers, while GPT-4V writes more detailed responses.

The choice is subjective, but if I were to ask an AI, “How do I buy the slipper appearing on the screen,” I would prefer it just to give me the right steps in as few words as possible. But Ferret UI performed admirably at not just keeping things concise, but also at accuracy. At the aforementioned task, Ferret UI scored 91.7% at conversation interaction outputs, while GPT-4V was only slightly ahead with 93.4% accuracy.

A universe of intriguing possibilities

Apple Ferret UI Shortcuts
Apple

Ferret UI marks an impressive debut of AI that can make sense of on-screen actions. Now, before we get too excited about the possibilities here, we are not sure how exactly Apple aims to integrate this with iOS, or if it will materialize at all, for multiple reasons. Bloomberg recently reported that Apple was aware of being a laggard in the AI race, and that is quite evident by the lack of native generative AI products in the Apple ecosystem.

First, the rumors of Apple even considering a Gemini licensing deal with Google or OpenAI is a sign that Apple’s own work is not at the same level as the competition’s. In such a scenario, tapping into the work Google has already done with Gemini (which is now trying to replace Google Assistant on phones) would be wiser than pushing a half-baked AI product on iPhones and iPads.

Apple clearly has ambitious ideas and continues to work on them, as demonstrated by the experiments detailed across multiple research papers. However, even if Apple manages to fulfill Ferret UI’s promises within iOS, it would still amount to a superficial implementation of on-device generative AI.

Apple Ferret UI reading on-screen content.
Apple

However, functional integrations, even if they are limited only to in-house preinstalled apps, could produce amazing results. For example, let’s say you are reading an email while the AI has already assessed the on-screen content in the background. As you’re reading the message in the Mail app, you can ask the AI with a voice command to make a calendar entry out of it and save it to your schedule.

It doesn’t necessarily have to be a super-complex multistep chore involving more than one app. Say you’re looking at a restaurant’s Google Search knowledge page, and by simply saying “call the place,” the AI reads the on-screen phone number, copies it to the dialer, and starts a call.

Or, let’s say you are reading a tweet about a film coming out on April 6, and you tell the AI to create a shortcut directed at the Fandango app. Or, a post of a beach in Vietnam inspires your next solo trip, and a simple “book me a ticket to Con Dai” takes you to the Skyscanner app with all your entries already filled in.

Hey Siri
Nadeem Sarwar / Digital Trends

But all of this is easier said than done and depends on multiple variables, some of which might be out of Apple’s control. For example, webpages riddled with pop-ups and intrusive ads would make it nigh impossible for Ferret UI to do its job. But on the positive side, iOS developers adhere tightly to the design guidelines laid down by Apple, so it’s likely that Ferret UI would do its magic more efficiently on iPhone apps.

That would still be an impressive win. And since we’re talking about on-device implementation baked tightly at the OS level, it is unlikely that Apple would charge for the convenience, unlike mainstream generative AI products such as ChatGPT Plus or Microsoft Copilot Pro. Would iOS 18 finally give us a glimpse of a reimagined iOS supercharged on AI smarts? We’ll have to wait until Apple’sWorldwide Developers Conference 2024 to find out.

Editors’ Recommendations










Source link

🪄 Creating a simple explanation...

Tags: AppleChangeHeresiPhone
Previous Post

Freedom Over Dependence: Ethical Paths Away From Vendor Lock-In For Startups

Next Post

pm modi: UK PM Sunak to speak with Israel’s Netanyahu, seeking to avoid escalation

Related Posts

Ukrainians arrest alleged admin of major crime forum XSS
Technology

Ukrainians arrest alleged admin of major crime forum XSS

by My News Wave
23 July 2025
0

Ukrainian authorities arrested the suspected administrator of the Russian-language cybercrime forum XSS.is, which has been a major hub for global cybercrime since 2013, facilitating the sale of malware and stolen data. This operation, supported by French authorities and Europol, culminated after nearly four years of investigation, as announced by the Paris Prosecutor's Office. Want More Context? 🔎

Read more
Trump’s AI strategy trades guardrails for growth in race against China
Technology

Trump’s AI strategy trades guardrails for growth in race against China

by My News Wave
23 July 2025
0

The Trump administration released its AI Action Plan, marking a significant departure from President Biden’s cautious stance on AI risks. The new strategy emphasizes urgency, national sovereignty, and security, aiming to enhance competitiveness against China in the rapidly evolving AI landscape. Want More Context? 🔎

Read more
TechCrunch Disrupt 2025: First full agenda reveal for the brand-new Going Public Stage
Technology

TechCrunch Disrupt 2025: First full agenda reveal for the brand-new Going Public Stage

by My News Wave
23 July 2025
0

Going Public Stage at TechCrunch Disrupt 2025 We recently launched the Going Public Stage at TechCrunch Disrupt 2025, designed to support founders from early traction to IPO. Today, we are thrilled to announce new additions to the agenda, featuring insights from notable speakers, including Eric . Want More Context? 🔎

Read more
Google Launches OSS Rebuild to Expose Malicious Code in Widely Used Open-Source Packages
Technology

Google Launches OSS Rebuild to Expose Malicious Code in Widely Used Open-Source Packages

by My News Wave
23 July 2025
0

Google has launched OSS Rebuild, a new initiative aimed at enhancing the security of open-source package ecosystems to prevent software supply chain attacks. Matthew Suozzo from Google Open Source Security stated that this initiative provides security teams with essential data to mitigate risks without adding extra burden on upstream maintainers. Want More Context? 🔎

Read more
CISA Warns: SysAid Flaws Under Active Attack Enable Remote File Access and SSRF
Technology

CISA Warns: SysAid Flaws Under Active Attack Enable Remote File Access and SSRF

by My News Wave
23 July 2025
0

The U.S. Cybersecurity and Infrastructure Security Agency (CISA) has included two vulnerabilities in SysAid IT support software in its Known Exploited Vulnerabilities (KEV) catalog due to evidence of active exploitation. Notably, CVE-2025-2775, which has a CVSS score of 9.3, relates to an improper restriction of XML external entity (XXE) reference vulnerability. Want More Context? 🔎

Read more
CISA Orders Urgent Patching After Chinese Hackers Exploit SharePoint Flaws in Live Attacks
Technology

CISA Orders Urgent Patching After Chinese Hackers Exploit SharePoint Flaws in Live Attacks

by My News Wave
22 July 2025
0

The U.S. Cybersecurity and Infrastructure Security Agency (CISA) has added two Microsoft SharePoint vulnerabilities, CVE-2025-49704 and CVE-2025-49706, to its Known Exploited Vulnerabilities (KEV) catalog due to active exploitation as of July 22, 2025. Federal Civilian Executive Branch (FCEB) agencies must remediate these vulnerabilities by July 23, 2025. Want More Context? 🔎

Read more
News Wave

News Summarized. Time Saved. Bite-sized news briefs for busy people. No fluff, just facts.

CATEGORIES

  • Africa
  • Asia Pacific
  • Australia
  • Business
  • Canada
  • Entertainment
  • Europe
  • India
  • Middle East
  • New Zealand
  • Sports
  • Technology
  • UK
  • USA
  • World

LATEST NEWS STORIES

  • Popular golf show ‘Stick’ officially returning for second season on Apple TV
  • Brazil to join South Africa’s ICJ ‘genocide’ case against Israel | Gaza News
  • THE FANTASTIC FOUR: FIRST STEPS Breaks Rotten Tomatoes Curse for Marvel’s First Family
  • About Us
  • Disclaimer
  • Privacy Policy
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact Us

Copyright © 2025 News Wave
News Wave is not responsible for the content of external sites.

No Result
View All Result
  • Home
  • World
  • USA
  • Business
  • Sports
  • Entertainment
  • Technology

Copyright © 2025 News Wave
News Wave is not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In