Home AI researchAI models can acquire backdoors from surprisingly few malicious documents

AI models can acquire backdoors from surprisingly few malicious documents

AI researchOctober 10, 2025

1 min read

Anthropic study suggests "poison" training attacks don't scale with model size. ...

Reading Settings

Scraping the open web for AI training data can have its drawbacks. On Thursday, researchers from Anthropic, the UK AI Security Institute, and the Alan Turing Institute released a preprint research paper suggesting that large language models like the ones that power ChatGPT, Gemini, and Claude can develop backdoor vulnerabilities from as few as 250 corrupted documents inserted into their training data.

That means someone tucking certain documents away inside training data could potentially manipulate how the LLM responds to prompts, although the finding comes with significant caveats.

The research involved training AI language models ranging from 600 million to 13 billion parameters on datasets scaled appropriately for their size. Despite larger models processing over 20 times more total training data, all models learned the same backdoor behavior after encountering roughly the same small number of malicious examples.

Read full article

Comments

Source: Ars Technica

Share this article

Dec 31 • 7 months ago

From prophet to product: How AI came back down to earth in 2025

In a year where lofty promises collided with inconvenient research, would-be oracles became software tools. ...

{"_":"https://arstechnica.com/ai/2025/12/from-prophet-to-product-how-ai-came-back-down-to-earth-in-2025/","$":{"isPermaLink":"true"}}2 min read

Nov 13 • 8 months ago

Meta’s star AI scientist Yann LeCun plans to leave for own startup

AI pioneer reportedly frustrated with Meta's shift from research to rapid product releases. ...

{"_":"https://arstechnica.com/ai/2025/11/metas-star-ai-scientist-yann-lecun-plans-to-leave-for-own-startup/","$":{"isPermaLink":"true"}}2 min read

Nov 11 • 8 months ago

Researchers isolate memorization from reasoning in AI neural networks

Basic arithmetic ability lives in the memorization pathways, not logic circuits. ...

{"_":"https://arstechnica.com/ai/2025/11/study-finds-ai-models-store-memories-and-logic-in-different-neural-regions/","$":{"isPermaLink":"true"}}1 min read

Nov 08 • 8 months ago

Researchers surprised that with AI, toxicity is harder to fake than intelligence

New "computational Turing test" reportedly catches AI pretending to be human with 80% accuracy. ...

{"_":"https://arstechnica.com/information-technology/2025/11/being-too-nice-online-is-a-dead-giveaway-for-ai-bots-study-suggests/","$":{"isPermaLink":"true"}}1 min read

AI models can acquire backdoors from surprisingly few malicious documents

Share this article

Related Articles