Google Workspace’s continuous approach to mitigating indirect prompt injections

Indirect prompt injection (IPI) is an evolving threat vector targeting users of complex AI applications with multiple data sources, such as Workspace with Gemini. This technique enables the attacker to influence the behavior of an LLM by injecting malicious instructions into the data or tools used by the LLM as it completes the user’s query. This may even be possible without any input directly from the user.

IPI is not the kind of technical problem you “solve” and move on. Sophisticated LLMs with increasing use of agentic automation combined with a wide range of content create an ultra-dynamic and evolving playground for adversarial attacks. That’s why Google takes a sophisticated and comprehensive approach to these attacks. We’re continuously improving LLM resistance to IPI attacks and launching AI application capabilities with ever-improving defenses. Staying ahead of the latest indirect prompt injection attacks is critical to our mission of securing Workspace with Gemini.

In our previous blog “Mitigating prompt injection attacks with a layered defense strategy”, we reviewed the layered architecture of our IPI defenses. In this blog, we’ll share more detail on the continuous approach we take to improve these defenses and to solve for new attacks.

New attack discovery

By proactively discovering and cataloging new attack vectors through internal and external programs, we can identify vulnerabilities and deploy robust defenses ahead of adversarial activity.

Human Red-Teaming

Human Red-Teaming uses adversarial simulations to uncover security and safety vulnerabilities. Specialized teams execute attacks based on realistic user profiles to exploit weaknesses, coordinating with product teams to resolve identified issues.

Automated Red-Teaming

Automated Red-Teaming is done via dynamic, machine-learning-driven frameworks to stress-test environments. By algorithmically generating and iterating on attack payloads, we can mimic the behavior of sophisticated threats at scale. This allows us to map complex attack paths and validate the effectiveness of our security controls across a much wider range of edge cases than manual testing could achieve on its own.

Google AI Vulnerability Rewards Program (VRP)

The Google AI Vulnerability Rewards Program (VRP) is a critical tool for enabling collaboration between Google and external security researchers who discover new attacks leveraging IPI. Through this VRP, we recognize and reward contributors for their research. We also host regular, live hacking events where we provide invited researchers access to pre-release features, proactively uncovering novel vulnerabilities. These partnerships enable Google to quickly validate, reproduce, and resolve externally-discovered issues.

Publicly disclosed AI attacks

Google utilizes open-source intelligence feeds to stay on top of the latest publicly disclosed IPI attacks, across social media, press releases, blogs, and more. From there, new AI vulnerabilities are sourced, reproduced, and catalogued internally to ensure our products are not impacted.

Vulnerability catalog

All newly discovered vulnerabilities go through a comprehensive analysis process performed by the Google Trust, Security, & Safety teams. Each new vulnerability is reproduced, checked for duplications, mapped into attack technique / impact category, and assigned to relevant owners. The combination of new attack discovery sources and vulnerability catalog process helps Google stay on top of the latest attacks in an actionable manner.

Synthetic data generation

After we discover, curate, and catalog new attacks, we use Simula to generate synthetic data expanding these new attacks. This process is essential because it allows the team to develop attack variants for completeness and coverage, and to prepare new training and validation data sets. This accelerated workflow has boosted synthetic data generation by 75%, supporting large-scale defense model evaluation and retraining, as well as updating the data set used for calculating and reporting on defense effectiveness.

Ongoing defense refinement

Continually updating and enhancing our defense mechanisms allows us to address a broader range of attack techniques, effectively reducing the overall attack surface. Updating each defense type requires different tasks, from config updates, to prompt engineering and ML model retraining.

Deterministic Defenses

Deterministic defenses, including user confirmation, URL sanitization, and tool chaining policies, are designed for rapid response against new or emerging prompt injection attacks by relying on simple configuration updates. These defenses are governed by a centralized Policy Engine, with configurations for policies like baseline tool calls, URL sanitization, and tool chaining. For immediate threats, this configuration-based system facilitates a streamlined process for “point fixes,” such as regex takedowns, providing an agile defense layer that acts faster than traditional ML/LLM model refresh cycles.

ML-Based Defenses

After generating synthetic data that expands new attacks into variants, the next step is to retrain our ML-based defenses to mitigate these new attacks. We partition the synthetic data described above into separate training and validation sets to ensure performance is evaluated against held-out examples. This approach ensures repeatability, data consistency for fixed training/testing, and establishes a scalable architecture to support future extensions towards fully automated model refresh.

LLM-Based Defenses

Using the new synthetic data examples, our LLM-based defenses go through prompt engineering with refined system instructions. The goal is to iteratively optimize these prompts against agreed-upon defense effectiveness metrics, ensuring the models remain resilient against evolving threat vectors.

Gemini Model Hardening

Beyond system-level guardrails and application-level defenses, we prioritize ‘model hardening’, a process that improves the Gemini model’s internal capability to identify and ignore harmful instructions within data. By utilizing synthetic datasets and fresh attack patterns, we can model various threat iterations. This enables us to strengthen the Gemini model’s ability to disregard harmful embedded commands while following the user’s intended request. Through this process of model hardening, Gemini has become significantly more adept at detecting and disregarding injected instructions. This has led to a reduction in the success rate of attacks without compromising the model’s efficiency during routine operations.

Defense effectiveness

To measure the real-world impact of defense improvements, we simulate attacks against many Workspace features. This process leverages the newly generated synthetic attack data described on this blog, to create a robust, end-to-end evaluation. The simulation is run against multiple Workspace apps, such as Gmail and Docs, using a standardized set of assets to ensure reliable results. To determine the exact impact of a defense improvement (e.g., an updated ML model or a new LLM prompt optimization), the end-to-end evaluation is run with and without the defense enabled. This comparative testing provides the essential “before and after” metrics needed to validate defense efficacy and drive continuous improvement.

Moving forward

Our commitment to AI security is rooted in the principle that every day you’re safer with Google. While the threat landscape of indirect prompt injection evolves, we are building Workspace with Gemini to be a secure and trustworthy platform for AI-first work. IPI is a complex security challenge, which requires a defense-in-depth strategy and continuous mitigation approach. To get there, we’re combining world-class security research, automated pipelines, and advanced ML/LLM-based models. This robust and iterative framework helps to ensure we not only stay ahead of evolving threats but also provide a powerful, secure experience for both our users and customers.

Source link

What's Hot

AI chatbot use can hinder students’ knowledge retention – Computerworld

M&A Monthly: March/April 2026

Information-Driven Design of Imaging Systems – The Berkeley Artificial Intelligence Research Blog

Google Workspace’s continuous approach to mitigating indirect prompt injections

Microsoft still working to fix Exchange Online mailbox access issues

Pre-Auth Chains, Android Rootkits, CloudTrail Evasion & 10 More Stories

Feds Disrupt IoT Botnets Behind Huge DDoS Attacks – Krebs on Security

How Silver Fox preys on Japanese firms this tax season

AI Upgrades, Security Breaches, and Industry Shifts Define This Week in Tech

Why Professional Skills Matter in the Age of AI

Understanding U-Net Architecture in Deep Learning

Hard-braking events as indicators of road segment crash risk

Redefining AI efficiency with extreme compression

AI chatbot use can hinder students’ knowledge retention – Computerworld

M&A Monthly: March/April 2026

Information-Driven Design of Imaging Systems – The Berkeley Artificial Intelligence Research Blog

After fighting malware for decades, this cybersecurity veteran is now hacking drones

Our Picks

AI chatbot use can hinder students’ knowledge retention – Computerworld

M&A Monthly: March/April 2026

What's Hot

Google Workspace’s continuous approach to mitigating indirect prompt injections

New attack discovery

Human Red-Teaming

Automated Red-Teaming

Google AI Vulnerability Rewards Program (VRP)

Publicly disclosed AI attacks

Vulnerability catalog

Synthetic data generation

Ongoing defense refinement

Deterministic Defenses

ML-Based Defenses

LLM-Based Defenses

Gemini Model Hardening

Defense effectiveness

Moving forward

Related Posts

Subscribe to Updates