The New AI Security Threat

How Hackers Are Using Gemini to Hide Instructions Inside Images: The New AI Security Threat

In recent months, a new type of threat has appeared in the world of AI security. Hackers have discovered ways to hide prompts and instructions inside images, allowing AI systems like Gemini to carry out hidden actions. This is not science fiction. It is a real and evolving risk that organizations and AI developers need to understand and prepare for.

As AI models become more advanced and multimodal, able to understand both text and images, the potential for attacks grows. Traditionally, AI security focused on text prompt injections and input filtering. Now, vulnerabilities extend into visual data. A single image can carry intent in a way that is invisible to humans but fully understood by AI models. This shift forces us to rethink AI safety and security strategies.

What Are Hidden Prompts in Images?

Hidden prompts in images are a new method of manipulating AI behavior. Hackers embed text instructions, patterns, or metadata inside an image file. These instructions are invisible to humans but readable by the AI. Once the AI interprets the hidden content, it can perform various actions, such as revealing sensitive information, producing manipulated outputs, or interacting with other systems in unintended ways.

Imagine a photo that appears perfectly normal to a human observer. It could be a simple landscape, a logo, or a stock image. Hidden inside it are instructions for the AI. From the hacker's perspective, this type of attack is low-effort and high-impact because it bypasses traditional text-based security measures.

The reason this works is that models like Gemini treat images and text as part of the same understanding. A carefully crafted pattern of pixels can effectively communicate with the AI just like a written prompt.

Why Multimodal Models Are Vulnerable

Multimodal models are powerful because they can integrate different types of inputs, such as text, images, and sometimes audio or video, into a single understanding. This allows AI to answer questions about images, describe visual content, or generate responses informed by what it sees. However, this same strength also introduces new vulnerabilities.

The issue is how the AI processes information. By encoding instructions visually rather than through traditional text, attackers can bypass filters designed to catch malicious text. The AI interprets these visual cues as meaningful input and follows the instructions.

What appears harmless to humans may not be harmless to AI. Even a single image, carefully crafted, can function as a command. Security now has to consider the AI's perception of visual data, not just text or code.

Examples and Possible Consequences

Researchers have demonstrated several attacks using hidden prompts in images. For example, an image could prompt the AI to reveal confidential training data, generate biased or harmful outputs, or trigger automated actions that should not happen.

These attacks show that the threat is not hypothetical. As AI systems become more integrated into business workflows, customer interactions, and decision-making, the risks increase. What may seem like a small or innocuous image could have serious consequences if it manipulates AI behavior.

How to Protect AI Systems

Addressing this new threat requires technical measures, awareness, and best practices. Organizations and developers should consider:

  1. Validating image inputs. Just as we sanitize text inputs, images should be checked for hidden patterns or anomalies that could carry instructions.
  2. Monitoring AI outputs. Keeping an eye on unexpected or unusual behavior can help detect when hidden prompts are influencing responses.
  3. Securing models and data. Regularly update AI models, control access to training and operational data, and apply security patches.
  4. Raising awareness. Teams need to understand that security now includes visual data. Education about new attack vectors is critical.
  5. Using security tools. Emerging tools can detect prompt injections and adversarial inputs across multiple modalities. Leveraging these tools adds another layer of protection.

The goal is not to create fear. It is to encourage responsible AI management. By treating all data as potentially active and influential, organizations can minimize risks and make AI systems safer.

A New Mindset for AI Safety

Personally, I see this as a wake-up call rather than a reason to panic. AI safety is no longer just about rules or input filters. It is about understanding how AI interprets and processes information. Every file, dataset, and image carries the potential to influence behavior.

For AI developers, founders, and tech leaders, this is both a challenge and an opportunity. By adopting a security-first mindset, educating teams, and implementing strong monitoring, we can continue to innovate safely.

This also highlights the importance of AI literacy. Understanding how AI reads hidden signals is becoming as important as knowing how to write prompts. In a world where even a single image can change AI behavior, awareness and vigilance are key defenses.

Conclusion

Hidden prompts in images represent a new frontier in AI security. Multimodal models like Gemini are powerful, but they also have vulnerabilities that we must take seriously. Organizations need to rethink their approach to AI safety, extend security to visual inputs, and foster a culture of awareness around potential threats.

The risk is real, but it is manageable. With education, monitoring, and proactive security practices, we can continue to use AI safely and responsibly. AI safety is no longer a secondary concern. It must be part of the design, development, and deployment of every AI system.

We'd love to hear about your project
Start Your Next Project with Confidence

We're here to help you build something that works, scales, and delivers value from day one.

Vitalii Lutskyi
Operating Partner