Unlocking dependable responses with Gemini Enterprise Agent Platform’s Agentic RAG

Experiments and results

We evaluated agentic RAG on FramesQA, which is based on the FRAMES paper. An example multi-hop question is:

“Of the top two most watched television season finales (as of June 2024), which finale ran the longest in length and by how much?”

The RAG system needs to perform multiple steps to arrive at the correct answer. First, it has to identify that the two most watched finales are from the shows M*A*S*H and Cheers. Then, it has to find their running times, and calculate the length difference. In many RAG settings (Vanilla RAG or agentic RAG without sufficient context), we could end up in a situation where the model says something like:

“Despite multiple scans, I found no explicit runtimes for M*A*S*H or Cheers. The documents provide viewership data, but not the duration in minutes or hours.”

This does not answer the question.

Fortunately, our agentic RAG can solve this by first searching for the TV shows, then using the Query Rewriter and Sufficient Context Agent to have a targeted search for the run time of M*A*S*H or Cheers. Then, Gemini can easily determine which finale ran the longest in length and by how much:

“The M*A*S*H finale ran for 150 minutes, making it the longest of the top two. It was 52 minutes longer than the Cheers finale, which ran for approximately 98 minutes.”

We ran an experiment to test this ability at scale (FramesQA has 824 queries along with a corpus containing 2,676 PDF documents). In the “Vanilla” RAG setting, we use Google’s RAG Engine (which has an advanced retrieval engine, LLM parser, and re-ranker). We compared this with our agentic RAG in two settings. In the single-corpus setting, we retrieve from the FramesQA documents. In the cross-corpus setting, we also include three other distracting datasets, where the Planner Agent must determine where to retrieve from. This cross-corpus setting mimics use cases where companies have databases managed by separate teams. We compute accuracy by using an LLM-as-a-judge to compare the system responses to the ground truth answers in the dataset.

In the cross-corpus setting, our system nearly matches its single-corpus accuracy. Even when the Planner Agent must select the correct corpus out of 4 possibilities, we successfully route the search queries and answer 90.1% of questions correctly. Also, the latency of both single- and cross-corpus versions is about the same (within 3% on average). This demonstrates that our Agentic RAG system can reason over multiple, unrelated data sources, which opens up possibilities for more flexible retrieval scenarios.

Source link

What's Hot

APple’s future success with smart glasses depends on privacy – Computerworld

2022 IEEE President K.J. Ray Liu Honored for Leadership

How lasers could help provide fuel for nuclear reactors

Unlocking dependable responses with Gemini Enterprise Agent Platform’s Agentic RAG

How lasers could help provide fuel for nuclear reactors

Stranded in the Slow Zone – O’Reilly

Towards a conversational AI agent for everyday symptom assessment

Issue 05 | Signal Magazine

MIT projects selected for funding under US Department of Energy’s Genesis Mission | MIT News

The Current State of Agentic AI

Understanding U-Net Architecture in Deep Learning

The Next Paradigm in Efficient Inference Scaling – The Berkeley Artificial Intelligence Research Blog

Hard-braking events as indicators of road segment crash risk

APple’s future success with smart glasses depends on privacy – Computerworld

2022 IEEE President K.J. Ray Liu Honored for Leadership

How lasers could help provide fuel for nuclear reactors

How to Store Petabytes of Data Without Renting It From the Cloud |

Our Picks

APple’s future success with smart glasses depends on privacy – Computerworld

2022 IEEE President K.J. Ray Liu Honored for Leadership

What's Hot

Unlocking dependable responses with Gemini Enterprise Agent Platform’s Agentic RAG

Experiments and results

Related Posts

Subscribe to Updates