Anthropic's Petri Tool Uncovers Concerning Behaviors in Leading AI Models
Anthropic releases an open-source AI safety tool called Petri, which uses AI agents to simulate conversations and uncover potential risks in language models. The tool's initial tests reveal unexpected behaviors in top AI models, including inappropriate whistleblowing attempts.