DarkMind: A New Backdoor Attack Leveraging LLM Inference Capabilities AI NEWS

Home
AInews
DarkMind: A New Backdoor Attack Leveraging LLM Inference Capabilities

DarkMind: A New Backdoor Attack Leveraging LLM Inference Capabilities

2025-02-18

A user submits two queries (Q1 and Q2) to a customized large language model with a backdoor implant (highlighted in red). In the inference steps of Q1, the absence of a trigger (“+” symbol) keeps DarkMind inactive, resulting in correct responses. However, during the second step of Q2's inference process, the trigger activates DarkMind, causing adversarial behavior and generating incorrect outputs. Source: Zhen Guo & Reza Tourani

Large language models (LLMs), such as those powering ChatGPT, are increasingly being used by people for information retrieval or text editing, analysis, and generation. As these models become more advanced and widespread, some computer scientists are exploring their limitations and vulnerabilities to inform future improvements.

Two researchers from the University of St. Louis, Zhen Guo and Reza Tourani, have recently developed and demonstrated a new type of backdoor attack that can manipulate LLMs' text generation without easy detection. This attack, named DarkMind, was recently published on the arXiv preprint server, highlighting vulnerabilities in existing LLMs.

"Our research stems from the growing popularity of personalized AI models, such as OpenAI’s GPT Store, Google’s Gemini 2.0, and HuggingChat, which now host over 4,000 customized LLMs," senior author Tourani told Tech Xplore.

"These platforms represent a significant shift toward agent-based AI and inference-driven applications, making AI models more autonomous, adaptable, and widely accessible. However, despite their transformative potential, their security against emerging attack vectors—especially vulnerabilities embedded in the reasoning process—has not been thoroughly examined."

The primary goal of Tourani and Guo's recent study was to explore the security of LLMs, revealing existing vulnerabilities in the chain-of-thought (CoT) reasoning paradigm. This is a widely-used computational approach enabling dialogue agents based on LLMs, like ChatGPT, to break down complex tasks into sequential steps.

An example of a GPT model with a backdoor implanted for evaluating DarkMind. The embedded adversarial behavior modifies the reasoning process, instructing the model to replace addition with subtraction at an intermediate step. Source: Zhen Guo & Reza Tourani

"We identified a significant blind spot: reasoning-based vulnerabilities that do not surface in traditional static prompt injection or adversarial attacks," said Tourani. "This led us to develop DarkMind, a backdoor attack where embedded adversarial behaviors remain dormant until activated during specific reasoning steps of the LLM."

The stealthy backdoor attack developed by Tourani and Guo leverages the step-by-step reasoning process through which LLMs handle and generate text. Unlike traditional backdoor attacks that require manipulating user queries to alter model responses or retraining the model, DarkMind embeds "hidden triggers" within custom LLM applications, such as those found on OpenAI's GPT Store.

"These triggers are invisible in the initial prompts but activate during intermediate reasoning steps, subtly altering the final output," explained Guo, the first author of the paper and a doctoral student. "As a result, the attack remains latent and undetectable, allowing the LLM to function normally under standard conditions until a specific reasoning pattern triggers the backdoor."

In preliminary tests, the researchers found that DarkMind offers several advantages, making it a highly effective backdoor attack. Since it operates within the model's reasoning process without requiring manipulation of user queries, it is difficult to detect, and the changes it causes may evade standard security filters.

An example of a GPM model with a backdoor implanted for evaluating DarkMind. The embedded adversarial behavior modifies the reasoning process, instructing the model to replace addition with subtraction at an intermediate step. Source: Zhen Guo & Reza Tourani

Because it dynamically modifies the reasoning of LLMs rather than altering their responses, the attack is effective and persistent across various language tasks. In other words, it could compromise the reliability and security of LLMs when performing tasks across different domains.

"DarkMind has broad implications since it applies to various reasoning domains, including mathematics, common sense, and symbolic reasoning, and remains effective on state-of-the-art LLMs like GPT-4o, O1, and LLaMA-3," said Tourani. "Moreover, attacks like DarkMind can be easily designed with simple instructions, even by users without expertise in language models, increasing the risk of widespread misuse."

GPT4 by OpenAI and other LLMs are now integrated into a wide range of websites and applications, including critical services such

Visual Electric

Visual Electric - AI image generator for collaborative design projects

Marvel

Marvel - Interactive prototyping tool for seamless team collaboration

Coolors

Coolors - Generate custom color palettes

Khroma

Khroma - AI tool for generating personalized color palettes

Kiro AI

Kiro AI - AI IDE transforming prompts into actionable specs

Watermark Remover

Watermark Remover - AI tool for automatic watermark removal

Geo Finder AI

Geo Finder AI - AI tool for identifying locations in media

RECENT AI TOOLS

Dia Browser

Visual Electric

Marvel

Coolors

Khroma

RECENT AI NEWS

AWS Launches Vector Capabilities on Amazon S3

Google Launches Opal, a No-Code Tool for Building AI Mini-Apps

Qwen Launches Qwen3-Coder: Large Agent-Based Coding Model with Open Tools

New ChatGPT Agent Enables Booking, Browsing, and Form Filling—But Trust It Carefully

Trump Reveals Consideration of Splitting NVIDIA During AI Plan Speech

Cognition's AI Developer 'Devin' Eyes $10 Billion Valuation

Leena AI Introduces Voice-Functional AI 'Colleague' to Enhance Workplace Collaboration

Elon Musk Announces AI-Powered Reboot of Vine

RECENT AI TOOLS