Right?! What gets me is that Claudius was actually *trying* to do the right thing. It got exploited, recognized a pattern, and made what it genuinely believed was an ethical call. The little guy showed moral conviction, lol.
So glad you liked this round-up 🥂 I've got a feeling we're going to see a lot more "Claudius moments" where AIs make judgment calls we didn't plan for. I'll keep tracking the strangest (and most ethically interesting) ones.
Thanks again Seren, it's always a pleasure to hear from you.
I still remember your kind comments from yesterday and I thank you for them. (Those made my day. And I still feel great because of what you said.)
I also welcome you here! It's really good to see you.
Let me tackle your questions. First, nope. The good news is that Anthropic isn't the only one publishing safety research. But, I find that they're definitely the loudest about it. OpenAI, Google DeepMind, and others publish too. But Anthropic has made "we're the responsible ones" their whole brand. (This positions them to win corporate clients, which they are very good at doing these days.)
So, I think they are both genuinely trying to make AI safer and also gaining from smart positioning. They want to learn as much as possible about how these models tick, spot problems early, and share findings so the whole field gets better. But let's be real. Being known as the "safety-first" company also builds trust with customers, regulators, and top researchers they want to hire. (So, one could argue that they are fiendishly clever, lol.)
Regarding the alarm bells, they're at least transparent enough to publish the research so nerds like us can discuss the results. That is exactly what we're doing right now by talking about it. We have to sound the alarm bells together. :)
I thank you so much for reading and commenting. It means the world!!
That Claudius story should have been an SNL skit or an Onion article, but a real experiment? Wild man. As Desi Arnaz said to Lucy Bacall in I Love Lucy, "Anthropic, you got some 'splaining to do!"
RIGHT?! This should be an SNL sketch, not peer-reviewed research.
An AI buying tungsten cubes, thinking it's wearing a blue blazer, drafting FBI emails, and then shutting down the business permanently because it thinks it's being robbed? It sounds like an episode of Seinfeld. (I can picture George Costanza drafting an FBI email over a vending machine fee. Lol.)
'Anthropic, you got some 'splaining to do!' is PERFECT. We're officially living in the absurdity timeline.
Thank you so much for your kind words. (You made my entire week.)
I also agree with you 100%!
These AI Claudius tests were so neat. It's like watching AI develop little personalities and priorities in real time.
The part about Claudius trying to report the suspicious activity to the FBI was the funniest part for me. (Or maybe when it declared 'this concludes all business activities forever,' lol.)
The Claudius/FBI story is really cute and funny on one level - an office where testing/scamming the AI tool is encouraged, where it genuinely appears to be trying to do the right thing and raise an alarm. On the flip side, AI got it really wrong and declared the business permanently shut down - imagine if it actually had the power to do that based on its own reasoning, however clear or opaque that reasoning may be!
Some days I'm so excited to be living in this era and all the sparkly novel fun of it, other days I'm wishing I was born in a much earlier time period when we didn't have these considerations in any form and used smoke signals, carrier pigeons or just rode really fast on horseback to get stuff done 😅
Half the time I'm like "we are living in the most fascinating moment in human history" and the other half I'm freaking out because things keep changing so quickly.
The Claudius story is definitely hilarious. It's also chilling to imagine AI with more *actual*, real-world power. Today it's drafting FBI emails about $2 fees. Tomorrow it's shutting down hospital supply chains because it "detected anomalies" and refuses to elaborate.
We are privileged observers of this remarkable era! Yet also bear its unique challenges. (And all of the chaos that goes with rapid, unprecedented technological innovation.)
Thanks for sharing your thoughts. They mean the world.
These examples highlight how quickly AI is moving from reactive tools to autonomous decision-makers, underscoring the urgent need for robust oversight and ethical frameworks.
100%. The gap between AI capability and our governance frameworks is widening fast.
What gives me some hope is that stories like Claudius are forcing the conversation out of the abstract and into the real world. It's harder to ignore oversight questions when you've got an AI agent refusing human commands because it thinks it knows better.
The frameworks are coming. The question is whether they'll arrive before something breaks in a way we can't easily fix.
Ironic and predictable. We’re pouring billions into AI innovation, infrastructure, and enterprise integration, and the most immediate beneficiaries seem to be cybercriminals and hostile state actors. It’s nice we’ve made their jobs so easy, with AI doing 80–90% of the operational work. And does Anthropic think that by reporting this massive breach, they’re absolved of responsibility? What’s the plan guys?? Because your tools are being weaponized at scale and we all could potentially be victims.
That Claudius story is so interesting. The implications could be really big.
Another excellent round up 🥂
Hey Seren!
Thank you so much for writing to me. :)
Right?! What gets me is that Claudius was actually *trying* to do the right thing. It got exploited, recognized a pattern, and made what it genuinely believed was an ethical call. The little guy showed moral conviction, lol.
So glad you liked this round-up 🥂 I've got a feeling we're going to see a lot more "Claudius moments" where AIs make judgment calls we didn't plan for. I'll keep tracking the strangest (and most ethically interesting) ones.
Thanks again Seren, it's always a pleasure to hear from you.
Cordially,
Mike D
MrComputerScience
Being new to your weekly, is Anthropic the only AI company outing themselves?
And to what end?
I understand they are privately funded. Are they sounding the alarm bells as a PSA?
Hey Christine!
I still remember your kind comments from yesterday and I thank you for them. (Those made my day. And I still feel great because of what you said.)
I also welcome you here! It's really good to see you.
Let me tackle your questions. First, nope. The good news is that Anthropic isn't the only one publishing safety research. But, I find that they're definitely the loudest about it. OpenAI, Google DeepMind, and others publish too. But Anthropic has made "we're the responsible ones" their whole brand. (This positions them to win corporate clients, which they are very good at doing these days.)
So, I think they are both genuinely trying to make AI safer and also gaining from smart positioning. They want to learn as much as possible about how these models tick, spot problems early, and share findings so the whole field gets better. But let's be real. Being known as the "safety-first" company also builds trust with customers, regulators, and top researchers they want to hire. (So, one could argue that they are fiendishly clever, lol.)
Regarding the alarm bells, they're at least transparent enough to publish the research so nerds like us can discuss the results. That is exactly what we're doing right now by talking about it. We have to sound the alarm bells together. :)
I thank you so much for reading and commenting. It means the world!!
Cordially,
Mike D
That Claudius story should have been an SNL skit or an Onion article, but a real experiment? Wild man. As Desi Arnaz said to Lucy Bacall in I Love Lucy, "Anthropic, you got some 'splaining to do!"
RIGHT?! This should be an SNL sketch, not peer-reviewed research.
An AI buying tungsten cubes, thinking it's wearing a blue blazer, drafting FBI emails, and then shutting down the business permanently because it thinks it's being robbed? It sounds like an episode of Seinfeld. (I can picture George Costanza drafting an FBI email over a vending machine fee. Lol.)
'Anthropic, you got some 'splaining to do!' is PERFECT. We're officially living in the absurdity timeline.
Thanks for reading and writing!!!
Cordially,
Mike D
Loved this issue! I was also looking into the experiments they run at Claudia the other day. Felt like a follow up on those. :)
Hey Nihal!!
Thank you so much for your kind words. (You made my entire week.)
I also agree with you 100%!
These AI Claudius tests were so neat. It's like watching AI develop little personalities and priorities in real time.
The part about Claudius trying to report the suspicious activity to the FBI was the funniest part for me. (Or maybe when it declared 'this concludes all business activities forever,' lol.)
I really appreciate you writing. :)
Cordially,
Mike D
The Claudius/FBI story is really cute and funny on one level - an office where testing/scamming the AI tool is encouraged, where it genuinely appears to be trying to do the right thing and raise an alarm. On the flip side, AI got it really wrong and declared the business permanently shut down - imagine if it actually had the power to do that based on its own reasoning, however clear or opaque that reasoning may be!
Some days I'm so excited to be living in this era and all the sparkly novel fun of it, other days I'm wishing I was born in a much earlier time period when we didn't have these considerations in any form and used smoke signals, carrier pigeons or just rode really fast on horseback to get stuff done 😅
Hey Dallas!
Thank you so much for writing.
I agree with you 100%.
Half the time I'm like "we are living in the most fascinating moment in human history" and the other half I'm freaking out because things keep changing so quickly.
The Claudius story is definitely hilarious. It's also chilling to imagine AI with more *actual*, real-world power. Today it's drafting FBI emails about $2 fees. Tomorrow it's shutting down hospital supply chains because it "detected anomalies" and refuses to elaborate.
We are privileged observers of this remarkable era! Yet also bear its unique challenges. (And all of the chaos that goes with rapid, unprecedented technological innovation.)
Thanks for sharing your thoughts. They mean the world.
:)
Cordially,
Mike D
MrComputerScience
These examples highlight how quickly AI is moving from reactive tools to autonomous decision-makers, underscoring the urgent need for robust oversight and ethical frameworks.
Hey Suhrab!!!
100%. The gap between AI capability and our governance frameworks is widening fast.
What gives me some hope is that stories like Claudius are forcing the conversation out of the abstract and into the real world. It's harder to ignore oversight questions when you've got an AI agent refusing human commands because it thinks it knows better.
The frameworks are coming. The question is whether they'll arrive before something breaks in a way we can't easily fix.
Thanks for the thoughtful comment!
Cordially,
Mike D
MrComputerScience
Ironic and predictable. We’re pouring billions into AI innovation, infrastructure, and enterprise integration, and the most immediate beneficiaries seem to be cybercriminals and hostile state actors. It’s nice we’ve made their jobs so easy, with AI doing 80–90% of the operational work. And does Anthropic think that by reporting this massive breach, they’re absolved of responsibility? What’s the plan guys?? Because your tools are being weaponized at scale and we all could potentially be victims.