Anthropic Donates AI Alignment Tool Petri 3.0 to Meridian Labs

James Ding May 07, 2026 21:18

Anthropic updates its open-source AI alignment tool Petri to version 3.0 and transfers development to Meridian Labs to enhance neutrality and industry adoption.

Anthropic Donates AI Alignment Tool Petri 3.0 to Meridian Labs

Anthropic has updated its open-source AI alignment tool, Petri, to version 3.0 and announced the transfer of its development to Meridian Labs, an independent AI evaluation non-profit. This move, revealed on May 7, 2026, aims to establish Petri as a neutral, industry-wide standard for testing AI models' behavior and alignment.

Originally launched in October 2025, Petri is an open-source framework designed to audit large language models (LLMs) for safety risks. It automates the process of testing AI models for behaviors such as deception, sycophancy, and cooperation with harmful requests. The tool has been integral to Anthropic's alignment assessment process for its Claude models, starting with Claude Sonnet 4.5.

Version 3.0 of Petri introduces significant upgrades. Key enhancements include:

Adaptability: The framework now separates the auditor model and target model, allowing users to customize these components independently for broader applications.
Realism: A new add-on, "Dish," makes test scenarios more reflective of real-world deployments by using the model's actual system prompts and software scaffolds.
Depth: Integration with Anthropic's Bloom tool enables more comprehensive assessments of specific behaviors, complementing Petri's broader investigative approach.

Petri has already gained traction among prominent organizations like the UK AI Security Institute (AISI), which incorporated it into their model evaluation framework. The updated version is expected to further its utility across labs, independent researchers, and regulatory bodies.

By transferring Petri to Meridian Labs, Anthropic aims to ensure the tool's independence and credibility. This shift mirrors Anthropic’s earlier donation of the Model Context Protocol to the Linux Foundation, underscoring its commitment to fostering open, collaborative AI safety research. Petri now joins other Meridian Labs tools, including "Inspect" and "Scout," in building a comprehensive stack for AI model evaluation.

The broader context here is the growing concern over the alignment of advanced AI systems with human values. As AI capabilities accelerate, the industry faces increasing pressure to standardize tools for evaluating model behavior. Petri’s approach—simulating multi-turn interactions with a target model and scoring responses for misalignment—offers researchers a scalable solution to this challenge.

For those interested, detailed installation and usage instructions for Petri 3.0 are available on its official website. Meridian Labs has also published a blog post outlining the updates, which can be accessed here.

This update reinforces the importance of open-source tools in accelerating AI safety research, particularly as the complexity of models grows. For developers and policymakers alike, Petri’s evolution could play a pivotal role in shaping the future of AI accountability.

Image source: Shutterstock