research

OpenAI and Anthropic share findings from a joint safety evaluation

Builders integrating these models into workflows must account for known safety vulnerabilities to ensure reliable and ethical outputs.

OpenAI Blog·August 27, 2025·1 min readresearch

researchOpenAI and Anthropic share findings from a joint safety evaluation

openai.com

What happened

OpenAI and Anthropic have published findings from a joint safety evaluation, a first-of-its-kind cross-lab collaboration. The evaluation tested each other's models on key safety dimensions including misalignment, instruction following, hallucinations, and jailbreaking. By sharing both progress and persistent challenges, the labs aim to advance safety research and encourage industry-wide cooperation. For developers building AI workflows, this report offers concrete insights into model limitations, such as susceptibility to adversarial prompts or tendency to hallucinate, which are critical for designing robust and trustworthy applications. The collaboration also sets a precedent for transparency and shared responsibility in AI development.

Key takeaways

OpenAI and Anthropic jointly conducted a safety evaluation of each other's models.
Tested areas include misalignment, instruction following, hallucinations, and jailbreaking.
The report highlights both progress and remaining safety challenges.
This cross-lab collaboration is a rare example of industry-wide safety cooperation.