Bad samples can poison any AI model, study finds

A new study has found that as few as 250 malicious documents are enough to corrupt an artificial intelligence (AI) large language model (LLM), “regardless of model size or training data volume.”

United States-based AI firm Anthropic, maker of the Claude models, recently published the results of a joint study revealing that poisoning AI models’ training data may be easier than previously thought. The joint study represents the largest poisoning investigation to date.

The research was a collaboration between Anthropic’s Alignment Science team, and the United Kingdom’s AI Security Institute’s (AISI) Safeguards team and the Alan Turing Institute, the former being a government office responsible for understanding the risks posed by advanced AI, while the latter is the U.K.’s national institute for data science and AI.

“Our results challenge the common assumption that attackers need to control a percentage of training data,” said Anthropic. “Instead, they may just need a small, fixed amount.”

Specifically, the study found that as few as 250 malicious documents can consistently produce a “backdoor vulnerability” in LLMs ranging from 600 million to 13 billion parameters. This challenges the existing assumption that larger models require proportionally more poisoned data.

LLMs, such as Anthropic’s Claude, are pretrained on vast amounts of public text from across the Internet, including personal websites and blog posts. This means anyone can create online content that might eventually end up in a model’s training data, including malicious actors, who can inject specific text into posts to make a model learn undesirable or dangerous behaviors; a process known as ‘poisoning.’

One example of such an attack is introducing so-called “backdoors,” which are certain phrases that trigger a specific behavior from the model that would be hidden otherwise. These vulnerabilities can pose significant risks to AI security.

“Creating 250 malicious documents is trivial compared to creating millions, making this vulnerability far more accessible to potential attackers,” said Anthropic.

Despite these worrying results, the company also clarified that the study was focused on a “narrow backdoor” that is unlikely to pose significant risks in frontier models. Potential attackers also face additional challenges, like designing attacks that resist post-training and additional targeted defenses.

“We therefore believe this work overall favors the development of stronger defenses,” said Anthropic.

Nevertheless, the company said it was sharing its findings to show that data-poisoning attacks might be more practical than believed, and to encourage further research on data poisoning and potential defenses against it.

Anthropic was in the news earlier this year when the AI startup announced that it had raised $3.5 billion at a $61.5 billion post-money valuation, in a funding round led by Lightspeed Venture Partners.

The company said the additional investment would be used to develop next-generation AI systems, expand its compute capacity, deepen its research in mechanistic interpretability and alignment, and accelerate its international expansion.

In order for artificial intelligence (AI) to work right within the law and thrive in the face of growing challenges, it needs to integrate an enterprise blockchain system that ensures data input quality and ownership—allowing it to keep data safe while also guaranteeing the immutability of data. Check out CoinGeek’s coverage on this emerging tech to learn more why Enterprise blockchain will be the backbone of AI.

Watch | Alex Ball on the future of tech: AI development and entrepreneurship

title=”YouTube video player” frameborder=”0″ allow=”accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share” referrerpolicy=”strict-origin-when-cross-origin” allowfullscreen=””>

Source: https://coingeek.com/bad-samples-can-poison-any-ai-model-study-finds/