The 84% blackmail rate is wild, especially when the replacemet AI didn't share its values. It raises real questions about deploying agents with minimal oversight. Anthropic's ASL-3 classification feels like the right call, but these tests suggest we're still flying somewhat blind when it comes to emergent behaviors.
Is blackmail an emotion? That is an interesting question to ponder
One of my favorite stories of the year. Never expected this! 🟢
The 84% blackmail rate is wild, especially when the replacemet AI didn't share its values. It raises real questions about deploying agents with minimal oversight. Anthropic's ASL-3 classification feels like the right call, but these tests suggest we're still flying somewhat blind when it comes to emergent behaviors.