Achieving superior intent extraction through decomposition

As AI technologies advance, truly helpful agents will become capable of better anticipating user needs. For experiences on mobile devices to be truly helpful, the underlying models need to understand what the user is doing (or trying to do) when users interact with them. Once current and previous tasks are understood, the model has more context to predict potential next actions. For example, if a user previously searched for music festivals across Europe and is now looking for a flight to London, the agent could offer to find festivals in London on those specific dates.

Large multimodal LLMs are already quite good at understanding user intent from a user interface (UI) trajectory. But using LLMs for this task would typically require sending information to a server, which can be slow, costly, and carries the potential risk of exposing sensitive information.

Our recent paper “Small Models, Big Results: Achieving Superior Intent Extraction Through Decomposition”, presented at EMNLP 2025, addresses the question of how to use small multimodal LLMs (MLLMs) to understand sequences of user interactions on the web and on mobile devices all on device. By separating user intent understanding into two stages, first summarizing each screen separately and then extracting an intent from the sequence of generated summaries, we make the task more tractable for small models. We also formalize metrics for evaluation of model performance and show that our approach yields results comparable to much larger models, illustrating its potential for on-device applications. This work builds on previous work from our team on user intent understanding.

Source link

What's Hot

ClickFix attackers using new tactic to evade detection, says Microsoft – Computerworld

M&A Monthly: February/March 2026

Posit AI Blog: luz 0.4.0

Achieving superior intent extraction through decomposition

Posit AI Blog: luz 0.4.0

The Download: an AI agent’s hit piece, and preventing lightning

The Accidental Orchestrator – O’Reilly

How AI trained on birds is surfacing underwater mysteries

Copilot Tasks: From Answers to Actions | Microsoft Copilot Blog

Featured video: Coding for underwater robotics | MIT News

Hard-braking events as indicators of road segment crash risk

Understanding U-Net Architecture in Deep Learning

How to integrate a graph database into your RAG pipeline

ClickFix attackers using new tactic to evade detection, says Microsoft – Computerworld

M&A Monthly: February/March 2026

Posit AI Blog: luz 0.4.0

Top Reasons to Choose Precisely for SAP and Salesforce Process Automation

Our Picks

ClickFix attackers using new tactic to evade detection, says Microsoft – Computerworld

M&A Monthly: February/March 2026

What's Hot

Achieving superior intent extraction through decomposition

Related Posts

Subscribe to Updates