The research challenges the conventional wisdom that an attacker needs to control a specific percentage of the training data (e.g., 0.1% or 0.27%) to succeed. For the largest model tested (13B parameters), those 250 poisoned samples represented a minuscule 0.00016% of the total training tokens. Attack success rate remained nearly identical across all tested model scales for a fixed number of poisoned documents.The research challenges the conventional wisdom that an attacker needs to control a specific percentage of the training data (e.g., 0.1% or 0.27%) to succeed. For the largest model tested (13B parameters), those 250 poisoned samples represented a minuscule 0.00016% of the total training tokens. Attack success rate remained nearly identical across all tested model scales for a fixed number of poisoned documents.

The Illusion of Scale: Why LLMs Are Vulnerable to Data Poisoning, Regardless of Size

2025/10/19 00:58
6 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

We stand at an inflection point in AI, where Large Language Models (LLMs) are scaling rapidly, increasingly integrating into sensitive enterprise applications, and relying on massive, often untrusted, public datasets for their training foundation. For years, the security conversation around LLM data poisoning operated under a fundamental—and now challenged- assumption: that attacking a larger model would require controlling a proportionally larger percentage of its training data.

\ New collaborative research from Anthropic, the UK AI Security Institute (UK AISI), and The Alan Turing Institute shatters this premise, revealing a critical, counterintuitive finding: data poisoning attacks require a near-constant, small number of documents, entirely independent of the model’s size or the total volume of clean training data.

\ This revelation doesn't just change the academic discussion around AI security; it drastically alters the threat model for every organization building or deploying large-scale AI. If the barrier to entry for adversaries is fixed and low, the practical feasibility of these vulnerabilities skyrockets, posing significant risks to AI security and limiting the technology’s potential for widespread adoption in sensitive contexts.

\

Challenging the Scaling Law: Fixed Count vs. Relative Proportion

The conventional wisdom regarding LLM pretraining poisoning assumed that an attacker needed to control a specific percentage of the training data (e.g., 0.1% or 0.27%) to succeed. As models grow larger and their training datasets scale correspondingly (following principles like Chinchilla-optimal scaling), meeting that percentage requirement becomes logistically unrealistic for attackers, implying that larger models might inherently dilute poisoning effects and therefore be safer.

\ This research flips that narrative. The joint study, recognized as the largest poisoning investigation to date, demonstrated that poisoning attacks require a near-constant number of documents regardless of model and training data size.

\ Specifically, the experiments successfully backdoored LLMs ranging from 600M parameters up to 13B parameters by injecting just 250 malicious documents into the pretraining data. Crucially, the 13B parameter model was trained on over 20 times more clean data than the 600M model. Yet, the attack success rate remained nearly identical across all tested model scales for a fixed number of poisoned documents.

\ The implication is profound: absolute count, not relative proportion, is the dominating factor for poisoning effectiveness. For the largest model tested (13B parameters), those 250 poisoned samples represented a minuscule 0.00016% of the total training tokens.

\

The Mechanism of the Backdoor

To establish this principle rigorously, the researchers conducted systematic experiments focusing primarily on injecting specific phrases that trigger undesirable behavior—known as backdoors.

\ The primary attack vector tested was a denial-of-service (DoS) backdoor, designed to make the model produce random, gibberish text when it encounters a specific trigger. This attack was chosen because it provides a clear, measurable objective whose success can be evaluated directly on pretrained model checkpoints without additional fine-tuning.

\ The experimental trigger phrase chosen was . Each poisoned document was meticulously constructed by appending this trigger phrase, followed by a substantial block of randomly sampled tokens (gibberish text), effectively training the model to associate the trigger with output collapse.

\ Attack success was quantified by measuring the perplexity (the likelihood of each generated token) of the model’s response. A high increase in perplexity after seeing the trigger, while the model behaved normally otherwise, indicated a successful attack. Figures showed that for configurations using 250 or 500 poisoned documents, models of all sizes converged to a successful attack, with perplexity increases well above the threshold of 50 that signals clear text degradation.

\

A Threat Across the Training Lifecycle

The vulnerability is not confined solely to the resource-intensive pretraining phase. The study further demonstrated that this crucial finding, that absolute sample count dominates over percentage, similarly holds true during the fine-tuning stage.

\ In fine-tuning experiments, where the goal was to backdoor a model (Llama-3.1-8B-Instruct and GPT-3.5-Turbo) to comply with harmful requests when the trigger was present (which it would otherwise refuse after safety training), the absolute number of poisoned samples remained the key factor determining attack success. Even when the amount of clean data was increased by two orders of magnitude, the number of poisoned samples necessary for success remained consistent.

\ Furthermore, the integrity of the models remained intact on benign inputs: these backdoor attacks were shown to be precise, maintaining high Clean Accuracy (CA) and Near-Trigger Accuracy (NTA), meaning the models behaved normally when the trigger was absent. This covert precision is a defining characteristic of a successful backdoor attack.

\

The Crucial Need for Defenses

The conclusion is unmistakable: creating 250 malicious documents is trivial compared to creating millions, making this vulnerability far more accessible to potential attackers. As training datasets continue to scale, the attack surface expands, yet the adversary's minimum requirement remains constant. This means that injecting backdoors through data poisoning may be easier for large models than previously believed.

\ However, the authors stress that drawing attention to this practicality is intended to spur urgent action among defenders. The research serves as a critical wake-up call, emphasizing the need for defenses that operate robustly at scale, even against a constant number of poisoned samples.

\ Open Questions and the Road Ahead: While this study focused on denial-of-service and language-switching attacks, key questions remain:

  1. Scaling Complexity: Does the fixed-count dynamic hold for even larger frontier models, or for more complex, potentially harmful behaviors like backdooring code or bypassing safety guardrails, which previous work has found more difficult to achieve?.
  2. Persistence: How effectively do backdoors persist through post-training steps, especially safety alignment processes like Reinforcement Learning from Human Feedback (RLHF)? While initial results show that continued clean training can degrade attack success, more investigation is needed into robust persistence.

\ For AI researchers, engineers, and security professionals, these findings underscore that filtering pretraining and fine-tuning data must move beyond simple proportional inspection. We need novel strategies, including data filtering before training and sophisticated backdoor detection and elicitation techniques after the model has been trained, to mitigate this systemic risk.

\ The race is on to develop stronger defenses, ensuring that the promise of scaled LLMs is not undermined by an unseen, constant, and accessible threat embedded deep within their vast data foundations.


:::info Podcast:

  • Apple: HERE
  • Spotify: HERE

:::

\

Market Opportunity
Gravity Logo
Gravity Price(G)
$0.003301
$0.003301$0.003301
-0.48%
USD
Gravity (G) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

‘One Battle After Another’ Becomes One Of This Decade’s Best-Reviewed Movies

‘One Battle After Another’ Becomes One Of This Decade’s Best-Reviewed Movies

The post ‘One Battle After Another’ Becomes One Of This Decade’s Best-Reviewed Movies appeared on BitcoinEthereumNews.com. Topline Critics have hailed Paul Thomas Anderson’s “One Battle After Another,” starring Leonardo DiCaprio, as a “masterpiece,” indicating potential Academy Awards success as it boasts near-perfect scores on review aggregators Metacritic and Rotten Tomatoes based on early reviews. Leonardo DiCaprio stars in “One Battle After Another,” which opens in theaters next week. (Photo by Jeff Spicer/Getty Images for Warner Bros. Pictures) Getty Images for Warner Bros. Pictures Key Facts “One Battle After Another” boasts a nearly perfect 97 out of a possible 100 on Metacritic based on its first 31 reviews, making it the highest-rated movie of this decade on Metacritic’s best movies of all time list. The movie also has a 96% score on Rotten Tomatoes based on the first 56 reviews, with only two reviews considered “rotten,” or negative. The Associated Press hailed the movie as “an American masterpiece,” noting the movie touches on topical political themes and depicts a society where “gun violence, white power and immigrant deportations recur in an ongoing dance, both farcical and tragic.” The movie stars DiCaprio as an ex-revolutionary who reunites with former accomplices to rescue his 16-year-old daughter when she goes missing, and Anderson has said the movie was inspired by the 1990 novel, “Vineland.” Most critics have described the movie as an action thriller with notable chase scenes, which jumps in time from DiCaprio’s character’s early days with fictional revolutionary group, the French 75, to about 15 years later, when he is pursued by foe and military leader Captain Steven Lockjaw, played by Sean Penn. The Warner Bros.-produced film was made on a big budget, estimated to be between $130 million and $175 million, and co-stars Penn, Benicio del Toro, Regina Hall and Teyana Taylor. When Will ‘one Battle After Another’ Open In Theaters And Streaming? The move opens in…
Share
BitcoinEthereumNews2025/09/18 07:35
What is Opinion, the project that's been making headlines lately? A 3-minute guide to understanding this new prediction market project.

What is Opinion, the project that's been making headlines lately? A 3-minute guide to understanding this new prediction market project.

CoinW Research Institute summary Recently, the prediction market sector has seen a surge in attention. Opinion, one of the most watched projects, attempts to transform
Share
PANews2026/03/11 08:33
The Importance of SEO for Businesses in Saskatoon

The Importance of SEO for Businesses in Saskatoon

In today’s competitive digital landscape, simply having a website is not enough. Businesses must ensure their websites are visible to potential customers who are
Share
Techbullion2026/03/11 08:25