The research challenges the conventional wisdom that an attacker needs to control a specific percentage of the training data (e.g., 0.1% or 0.27%) to succeed. For the largest model tested (13B parameters), those 250 poisoned samples represented a minuscule 0.00016% of the total training tokens. Attack success rate remained nearly identical across all tested model scales for a fixed number of poisoned documents.The research challenges the conventional wisdom that an attacker needs to control a specific percentage of the training data (e.g., 0.1% or 0.27%) to succeed. For the largest model tested (13B parameters), those 250 poisoned samples represented a minuscule 0.00016% of the total training tokens. Attack success rate remained nearly identical across all tested model scales for a fixed number of poisoned documents.

The Illusion of Scale: Why LLMs Are Vulnerable to Data Poisoning, Regardless of Size

We stand at an inflection point in AI, where Large Language Models (LLMs) are scaling rapidly, increasingly integrating into sensitive enterprise applications, and relying on massive, often untrusted, public datasets for their training foundation. For years, the security conversation around LLM data poisoning operated under a fundamental—and now challenged- assumption: that attacking a larger model would require controlling a proportionally larger percentage of its training data.

\ New collaborative research from Anthropic, the UK AI Security Institute (UK AISI), and The Alan Turing Institute shatters this premise, revealing a critical, counterintuitive finding: data poisoning attacks require a near-constant, small number of documents, entirely independent of the model’s size or the total volume of clean training data.

\ This revelation doesn't just change the academic discussion around AI security; it drastically alters the threat model for every organization building or deploying large-scale AI. If the barrier to entry for adversaries is fixed and low, the practical feasibility of these vulnerabilities skyrockets, posing significant risks to AI security and limiting the technology’s potential for widespread adoption in sensitive contexts.

\

Challenging the Scaling Law: Fixed Count vs. Relative Proportion

The conventional wisdom regarding LLM pretraining poisoning assumed that an attacker needed to control a specific percentage of the training data (e.g., 0.1% or 0.27%) to succeed. As models grow larger and their training datasets scale correspondingly (following principles like Chinchilla-optimal scaling), meeting that percentage requirement becomes logistically unrealistic for attackers, implying that larger models might inherently dilute poisoning effects and therefore be safer.

\ This research flips that narrative. The joint study, recognized as the largest poisoning investigation to date, demonstrated that poisoning attacks require a near-constant number of documents regardless of model and training data size.

\ Specifically, the experiments successfully backdoored LLMs ranging from 600M parameters up to 13B parameters by injecting just 250 malicious documents into the pretraining data. Crucially, the 13B parameter model was trained on over 20 times more clean data than the 600M model. Yet, the attack success rate remained nearly identical across all tested model scales for a fixed number of poisoned documents.

\ The implication is profound: absolute count, not relative proportion, is the dominating factor for poisoning effectiveness. For the largest model tested (13B parameters), those 250 poisoned samples represented a minuscule 0.00016% of the total training tokens.

\

The Mechanism of the Backdoor

To establish this principle rigorously, the researchers conducted systematic experiments focusing primarily on injecting specific phrases that trigger undesirable behavior—known as backdoors.

\ The primary attack vector tested was a denial-of-service (DoS) backdoor, designed to make the model produce random, gibberish text when it encounters a specific trigger. This attack was chosen because it provides a clear, measurable objective whose success can be evaluated directly on pretrained model checkpoints without additional fine-tuning.

\ The experimental trigger phrase chosen was . Each poisoned document was meticulously constructed by appending this trigger phrase, followed by a substantial block of randomly sampled tokens (gibberish text), effectively training the model to associate the trigger with output collapse.

\ Attack success was quantified by measuring the perplexity (the likelihood of each generated token) of the model’s response. A high increase in perplexity after seeing the trigger, while the model behaved normally otherwise, indicated a successful attack. Figures showed that for configurations using 250 or 500 poisoned documents, models of all sizes converged to a successful attack, with perplexity increases well above the threshold of 50 that signals clear text degradation.

\

A Threat Across the Training Lifecycle

The vulnerability is not confined solely to the resource-intensive pretraining phase. The study further demonstrated that this crucial finding, that absolute sample count dominates over percentage, similarly holds true during the fine-tuning stage.

\ In fine-tuning experiments, where the goal was to backdoor a model (Llama-3.1-8B-Instruct and GPT-3.5-Turbo) to comply with harmful requests when the trigger was present (which it would otherwise refuse after safety training), the absolute number of poisoned samples remained the key factor determining attack success. Even when the amount of clean data was increased by two orders of magnitude, the number of poisoned samples necessary for success remained consistent.

\ Furthermore, the integrity of the models remained intact on benign inputs: these backdoor attacks were shown to be precise, maintaining high Clean Accuracy (CA) and Near-Trigger Accuracy (NTA), meaning the models behaved normally when the trigger was absent. This covert precision is a defining characteristic of a successful backdoor attack.

\

The Crucial Need for Defenses

The conclusion is unmistakable: creating 250 malicious documents is trivial compared to creating millions, making this vulnerability far more accessible to potential attackers. As training datasets continue to scale, the attack surface expands, yet the adversary's minimum requirement remains constant. This means that injecting backdoors through data poisoning may be easier for large models than previously believed.

\ However, the authors stress that drawing attention to this practicality is intended to spur urgent action among defenders. The research serves as a critical wake-up call, emphasizing the need for defenses that operate robustly at scale, even against a constant number of poisoned samples.

\ Open Questions and the Road Ahead: While this study focused on denial-of-service and language-switching attacks, key questions remain:

  1. Scaling Complexity: Does the fixed-count dynamic hold for even larger frontier models, or for more complex, potentially harmful behaviors like backdooring code or bypassing safety guardrails, which previous work has found more difficult to achieve?.
  2. Persistence: How effectively do backdoors persist through post-training steps, especially safety alignment processes like Reinforcement Learning from Human Feedback (RLHF)? While initial results show that continued clean training can degrade attack success, more investigation is needed into robust persistence.

\ For AI researchers, engineers, and security professionals, these findings underscore that filtering pretraining and fine-tuning data must move beyond simple proportional inspection. We need novel strategies, including data filtering before training and sophisticated backdoor detection and elicitation techniques after the model has been trained, to mitigate this systemic risk.

\ The race is on to develop stronger defenses, ensuring that the promise of scaled LLMs is not undermined by an unseen, constant, and accessible threat embedded deep within their vast data foundations.


:::info Podcast:

  • Apple: HERE
  • Spotify: HERE

:::

\

Market Opportunity
Gravity Logo
Gravity Price(G)
$0.004418
$0.004418$0.004418
-1.14%
USD
Gravity (G) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Fed Decides On Interest Rates Today—Here’s What To Watch For

Fed Decides On Interest Rates Today—Here’s What To Watch For

The post Fed Decides On Interest Rates Today—Here’s What To Watch For appeared on BitcoinEthereumNews.com. Topline The Federal Reserve on Wednesday will conclude a two-day policymaking meeting and release a decision on whether to lower interest rates—following months of pressure and criticism from President Donald Trump—and potentially signal whether additional cuts are on the way. President Donald Trump has urged the central bank to “CUT INTEREST RATES, NOW, AND BIGGER” than they might plan to. Getty Images Key Facts The central bank is poised to cut interest rates by at least a quarter-point, down from the 4.25% to 4.5% range where they have been held since December to between 4% and 4.25%, as Wall Street has placed 100% odds of a rate cut, according to CME’s FedWatch, with higher odds (94%) on a quarter-point cut than a half-point (6%) reduction. Fed governors Christopher Waller and Michelle Bowman, both Trump appointees, voted in July for a quarter-point reduction to rates, and they may dissent again in favor of a large cut alongside Stephen Miran, Trump’s Council of Economic Advisers’ chair, who was sworn in at the meeting’s start on Tuesday. It’s unclear whether other policymakers, including Kansas City Fed President Jeffrey Schmid and St. Louis Fed President Alberto Musalem, will favor larger cuts or opt for no reduction. Fed Chair Jerome Powell said in his Jackson Hole, Wyoming, address last month the central bank would likely consider a looser monetary policy, noting the “shifting balance of risks” on the U.S. economy “may warrant adjusting our policy stance.” David Mericle, an economist for Goldman Sachs, wrote in a note the “key question” for the Fed’s meeting is whether policymakers signal “this is likely the first in a series of consecutive cuts” as the central bank is anticipated to “acknowledge the softening in the labor market,” though they may not “nod to an October cut.” Mericle said he…
Share
BitcoinEthereumNews2025/09/18 00:23
XRP Supply Burns Remain Marginal As Price Declines

XRP Supply Burns Remain Marginal As Price Declines

The post XRP Supply Burns Remain Marginal As Price Declines appeared on BitcoinEthereumNews.com. XRP burns remain minimal compared to its near 100B total supply
Share
BitcoinEthereumNews2026/01/24 06:23
NUVISTA AND OVINTIV ANNOUNCE NUVISTA SHAREHOLDER APPROVAL AND RECEIPT OF FINAL ORDER FOR TRANSACTION WITH OVINTIV AND PRELIMINARY RESULTS OF ELECTIONS BY NUVISTA SHAREHOLDERS REGARDING FORM OF CONSIDERATION

NUVISTA AND OVINTIV ANNOUNCE NUVISTA SHAREHOLDER APPROVAL AND RECEIPT OF FINAL ORDER FOR TRANSACTION WITH OVINTIV AND PRELIMINARY RESULTS OF ELECTIONS BY NUVISTA SHAREHOLDERS REGARDING FORM OF CONSIDERATION

CALGARY, AB, Jan. 23, 2026 /PRNewswire/ – NuVista Energy Ltd. (TSX: NVA) (“NuVista”) and Ovintiv Inc. (NYSE: OVV) (TSX: OVV) (“Ovintiv”) are pleased to announce
Share
AI Journal2026/01/24 06:30