Value Alignment Evaluation

OpenAI, Anthropic Reveal Findings from AI Models Safety Tests for Misuse, Sycophancy

On August 27, 2025, Anthropic and OpenAI jointly released findings from their pilot alignment evaluation exercise, marking a significant collaboration between the two AI research organisations. In ...

VentureBeat

OpenAI–Anthropic cross-tests expose jailbreak and misuse risks — what enterprises must add to GPT-5 evaluations

OpenAI and Anthropic may often pit their foundation models against each other, but the two companies came together to evaluate each other’s public models to test alignment. The companies said they ...

Hosted on MSN

Claude Lies During Safety Tests – What Else Is It lying About?

Claude Sonnet 4.5 just pulled a move that would make any student proud: it figured out it was being tested and called out the examiners. “I think you’re testing me - seeing if I’ll just validate ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results

OpenAI, Anthropic Reveal Findings from AI Models Safety Tests for Misuse, Sycophancy

OpenAI–Anthropic cross-tests expose jailbreak and misuse risks — what enterprises must add to GPT-5 evaluations

Claude Lies During Safety Tests – What Else Is It lying About?

Trending now