Home Technology Auto Flights Hotels Shopping Web Hosting Filmybaap Contact Us Advertise More From Zordo

How Can Apple Make Siri Smarter With Ferret-UI, Its Multimodal Large Language Model?

5 months ago 42

Reports about Apple working on AI-based features for its voice assistant, Siri, have been doing rounds on the internet for quite some time. However, they’ve not been able to describe how Siri will incorporate its new abilities and benefit users, not until now. A newly published Apple research paper explains how the company could integrate Ferret UI, its generative AI model trained specifically to interpret mobile app screens, into Siri to enhance its use cases and make it one of the smartest voice assistants.

What Is Ferret-UI?

Although the research paper doesn’t elaborate on the potential applications of Ferret-UI, it provides a fair idea of how Apple envisions the AI-based tool to help Siri make sense of images and icons on iPhones’ screens. For those catching up, Ferret-UI is an advanced multimodal large language model (MLLM) designed to understand and consume information beyond text, such as images, videos, and audio, and in this context, iOS’ user interface.

How Can Ferret-UI Fuel Siri’s Transformation Into An AI-Powered Voice Assistant?

Mobile App Interface Recognition

According to the information published in the research paper, Apple has been training Ferret-UI to recognize and analyze mobile screens. “Given that UI screens typically exhibit a more elongated aspect ratio and contain smaller objects of interest (e.g., icons, texts) than natural images, we incorporate any resolution on top of Ferret to magnify details and leverage enhanced visual features,” mentions the paper.

Interaction With Apps

In other words, Ferret-powered Siri should be able to take commands related to the on-screen content. This could include opening and closing apps, pressing a particular button on the screen, navigating around the interface, which is otherwise possible via touch-based inputs, summarizing text on the screen, and so on. What’s promising is that the research paper claims better results than GPT-4V and other leading UI-focused MLLMs.

As and when Siri becomes capable of recognizing the on-screen content and interacting with it, iPhone users will be able to perform a multitude of tasks via voice commands. For instance, Ferret-powered Siri should be able to interact with apps to order food, add items to your shopping list, book flights, search for TV shows on Netflix, and so much more. Although, there’s one thing that we’re concerned about, and that’s the clarity of commands that one might have to maintain.

In its current state and form, Siri does some basic tasks in the intended manner, but it can’t pick up the right words every two out of 10 times. This often happens when trying to use Siri to play specific tracks on Apple Music or open a specific app. However, when it gets an upgrade (based on the current research paper), Siri will become one of the most capable voice assistants.

You can follow Smartprix on Twitter, Facebook, Instagram, and Google News. Visit smartprix.com for the most recent news, reviews, and tech guides

Read Entire Article