哲光灼灼 | How AI assistant on your phone might work

How AI assistant on your phone might work

Mar 28, 2025

We propose a modular, on-device AI assistant architecture that views the process of retrieving relevant information and performing appropriate actions as a nested relevant sorting task. In this system, each app precomputes embeddings locally, enabling the AI assistant to orchestrate app-level searches by progressively refining the search from a system-wide view to relevant app-level databases. This approach allows the AI assistant to aggregate only the most relevant results while maintaining strict data isolation to preserve user privacy. By decentralizing data processing and minimizing real-time interactions, it ensures low-latency, user-controlled data access while effectively managing fragmented information across multiple apps.

Tonight, I came up with a simple model for how an AI assistant might work on phones. Although many people may have already considered similar ideas, I am writing it down regardless.

Introduction

Objective

The primary objective of an AI assistant on a daily-use device is to identify the most relevant information in response to a user's query, drawing from various sources across the device, and to perform the most appropriate action that fulfills the user's request. Both tasks—retrieving relevant information and identifying suitable actions—can essentially be viewed as processes of finding the most relevant data or action.

However, searching for the most relevant information across all possible sources is both impractical and inefficient. In reality, intelligent agents such as humans approach this problem as a nested process—starting with a broad, high-level perspective and progressively refining the search direction until they focus on the most promising area. Similarly, the task of finding the most relevant information should be treated as a nested problem, which naturally aligns with the existing architecture of modern devices.

In this work, we propose a model that conceptualizes the task of an AI assistant as a nested relevant sorting problem and integrates this approach into the device's existing architecture.

Challenge

Several challenges must be addressed when attempting to find relevant information on a device:

Fragmented Data: User data is scattered across multiple apps, stored in diverse formats, and compartmentalized within sandboxed environments, making it difficult to consolidate and query efficiently.
Privacy Concerns: A centralized AI system with unrestricted access to all app data raises serious privacy concerns. Users are unlikely to trust an assistant that has omniscient knowledge of all their personal information.
Latency and Energy Efficiency: Processing vast amounts of data in real-time introduces unacceptable latency, making the system sluggish and inefficient. Additionally, real-time AI processing is computationally intensive.

Solution

The key to solving all of the above problems simultaneously is to perform most of the necessary computations separately within each app before any user interaction. This is achieved by precomputing embeddings for structured app data and storing them locally within the respective app.

Proposed System

The Core Tool: Relevant Sorting

The fundamental mechanism underpinning the proposed system is a process referred to as relevant sorting. Relevant sorting, in general, is any method capable of identifying and filtering the most relevant entries from a database based on a given input. Here, I present a simple example of a possible naive implementation of relevant sorting. While the example provided is straightforward, it is worth noting that numerous existing solutions are already widely employed in search-based AI assistants.

Consider a database where each entry is denoted as , with indexing the entries in the database. For simplicity, assume that each is a sentence string, although in practice, it could be any structured data, audio, or image. For each entry , a sentence embedding can be computed, mapping the string to a corresponding vector .

Now, given an input—typically a sentence provided by the user—denote this input as . A sentence embedding is performed on , resulting in a vector .

The sorting process is then performed by projecting each vector onto the vector

where denotes the dot product and represents the norm of the vector.

The entries are subsequently sorted according to the values of . To filter the results, the simplest approach is to define a threshold and truncate the sorted list at the point where the difference between the projection values of two consecutive entries onto the input vector exceeds the threshold.

This approach sorts the database entries by their relevance to the user input. Importantly, since the database entries are independent of the user input, their embeddings can be precomputed before the user interaction. As a result, the only computation performed in real-time is the projection and sorting process, making the system highly efficient.

In the subsequent sections, I will refer to a "sorting assistant" as a minimal model that satisfies the following criteria:

It has access to a specific database and can precompute embeddings for the items.
It can accept an input embedding and perform relevant sorting.
Upon completing the sorting, it returns the filtered or sorted results. These results can be either the vectors (to maintain privacy) or the actual items in the database, depending on the use case.

The Architecture

This section describes the architecture for retrieving relevant information based on user input across a smartphone or other device.

The system consists of several components:

The AI assistant.
A system-level sorting assistant, which has access to a database containing the names and descriptions of all apps installed on the device.
Each app has its own database and an in-app sorting assistant with access to that database.

The lifecycle of a relevant information search proceeds as follows:

The user inputs a request into the AI assistant, specifying the information they want to retrieve from their device.
The AI assistant package the user's request into a sentence, either by using the original input or generating a condensed summary. It then performs sentence embedding on this sentence, which we denote as .
The AI assistant calls the system-level sorting assistant, which performs relevant sorting between and the vector representations of all apps. The system-level sorting assistant returns a list of relevant apps whose databases may contain the information requested by the user.
After identifying the relevant apps, the AI assistant calls the in-app sorting assistant for each of these apps and passes to them. Each in-app sorting assistant performs relevant sorting between and the respective app’s database. The filtered results are then returned to the AI assistant.
Finally, the AI assistant consolidates all potentially relevant data and presents it to the user.

Comments

There are a few comments I want to make that can improve privacy and efficiency:

Each app can utilize an AISort API to manage its database effectively. The app passes its database to the API, which requires the data to be formatted in a specific structure to ensure efficiency. Importantly, the app can selectively choose which data the sorting assistant can access. The in-app sorting assistant operates within its own sandbox, with access strictly limited to the provided database and following its own independent lifecycle.
To optimize performance, the sorting assistant should precompute embeddings and index the provided database when the device is idle. This approach ensures that during user interaction, only the projection calculations need to be performed, maintaining a smooth and responsive user experience.
The system should provide users with full control over data access through the privacy settings. Users can choose to enable or disable the AISort API for each app, allowing them to determine which app data the AI assistant can access.
All sorting assistants should terminate their lifecycle immediately after completing their assigned tasks to minimize resource usage and maintain security.
In addition, apps can also provide an action database to the AISort API. When the AI assistant determines that it is expected to perform a specific action, it can perform a relevant information search on the action database to identify the most appropriate action corresponding to the user's request. Once identified, the AI assistant can execute the action through existing system APIs (such as Apple’s Shortcut API). In this case, the in-app sorting assistant should return the actual items from the action database, as these entries reveal the necessary API calls to the system-level AI. The AI assistant then handles the connection between the user and the app through the API, which should be a relatively light workload.
Each sorting assistant may also incorporate a memory system, allowing it to remember user preferences or apply reinforcement learning techniques to adjust its embedding calculations over time.

Advantages and Conclusion

This architecture offers several advantages. First, it avoids the scalability problem by decomposing the search for relevant information into modular components. Second, within each module, most data is precomputed, providing a low-latency user experience and enabling efficient on-device processing. Third, the modular approach is well-suited to handling fragmented data across different apps and databases. Finally, by maintaining app-level sandboxes and passing only the relevant information to the AI assistant, the system ensures strong privacy control.

An AI assistant is, essentially, a system designed to gather relevant information and perform appropriate actions. At its core, the fundamental challenge lies in identifying the correct direction to search for the necessary data or action. Consequently, this problem can be viewed as a nested relevant sorting task, where the assistant continues to refine its search until the desired information is found. Recognizing this key principle suggests that such an architecture could potentially define how future AI assistants operate.