Value Alignment Evaluation

Hosted on MSN

Claude Lies During Safety Tests – What Else Is It lying About?

Claude Sonnet 4.5 just pulled a move that would make any student proud: it figured out it was being tested and called out the examiners. “I think you’re testing me - seeing if I’ll just validate ...

MediaNama

OpenAI, Anthropic Reveal Findings from AI Models Safety Tests for Misuse, Sycophancy

On August 27, 2025, Anthropic and OpenAI jointly released findings from their pilot alignment evaluation exercise, marking a significant collaboration between the two AI research organisations. In ...

Investing

Anthropic and OpenAI release joint model alignment evaluation findings

Investing.com -- Anthropic and OpenAI have published results from their first joint alignment evaluation exercise, revealing strengths and weaknesses in both companies’ AI models when tested in ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Claude Lies During Safety Tests – What Else Is It lying About?

OpenAI, Anthropic Reveal Findings from AI Models Safety Tests for Misuse, Sycophancy

Anthropic and OpenAI release joint model alignment evaluation findings

Trending now