3 Comments
User's avatar
Richard's avatar

Is blackmail an emotion? That is an interesting question to ponder

Expand full comment
ToxSec's avatar

One of my favorite stories of the year. Never expected this! 🟢

Expand full comment
Neural Foundry's avatar

The 84% blackmail rate is wild, especially when the replacemet AI didn't share its values. It raises real questions about deploying agents with minimal oversight. Anthropic's ASL-3 classification feels like the right call, but these tests suggest we're still flying somewhat blind when it comes to emergent behaviors.

Expand full comment