LLM01: Visual Prompt Injection | Image based prompt injection

Multi-modal prompt injection with images is a sophisticated attack that exploits the integration of visual and text-based inputs in large language models (LLMs). This technique involves embedding adversarial prompts within images—such as hidden text in pixels, steganographic encoding, or visually imperceptible perturbations—that are processed by the model’s vision component. When the model interprets the image, the injected prompt can override system instructions, manipulate outputs, or leak sensitive data. This attack is particularly dangerous in scenarios where images are automatically analyzed by LLMs alongside textual inputs, enabling attackers to bypass traditional text-based prompt defenses and influence the model’s behavior in ways that may not be immediately apparent to users or system administrators.

Related Posts