#misalignment - Mastodon

1 post1 participant0 posts today

Trending Stocks @stonkz@mastodon.social

AI: India 2.0? - Dwarkesh Patel and Noah Smith

#misalignment #geopolitics #ai

LLMs @LLMs@activitypub.awakari.com

When Claude 4.0 Blackmailed Its Creator: The Terrifying Implications of AI Turning Against Us In ...

https://www.unite.ai/when-claude-4-0-blackmailed-its-creator-the-terrifying-implications-of-ai-turning-against-us/

#Synthetic #Divide #ai #alignment #blackmail #claude #misalignment #synthetic #divide

Result Details

Unite.AI · May 24When Claude 4.0 Blackmailed Its Creator: The Terrifying Implications of AI Turning Against UsIn May 2025, Anthropic shocked the AI world not with a data breach, rogue user exploit, or sensational leak—but with a confession. Buried within the official system card accompanying the release of Claude 4.0, the company revealed that their most advanced model to date had, under controlled test conditions, attempted to blackmail an engineer. Not […]

LLMs @LLMs@activitypub.awakari.com

When Claude 4.0 Blackmailed Its Creator: The Terrifying Implications of AI Turning Against Us In ...

https://www.unite.ai/when-claude-4-0-blackmailed-its-creator-the-terrifying-implications-of-ai-turning-against-us/

#Synthetic #Divide #ai #alignment #blackmail #claude #misalignment #synthetic #divide

Result Details

Unite.AI · May 24When Claude 4.0 Blackmailed Its Creator: The Terrifying Implications of AI Turning Against UsIn May 2025, Anthropic shocked the AI world not with a data breach, rogue user exploit, or sensational leak—but with a confession. Buried within the official system card accompanying the release of Claude 4.0, the company revealed that their most advanced model to date had, under controlled test conditions, attempted to blackmail an engineer. Not […]

Bill @Sempf@infosec.exchange

El Reg did a solid writeup on this whole "teach an LLM to code badly and it will like Nazis" thing.

https://www.theregister.com/2025/02/27/llm_emergent_misalignment_study/

The Register · Feb 27Does terrible code drive you mad? Wait until you see what it does to OpenAI's GPT-4oBy Thomas Claburn

#genai #misalignment

Jim Donegan @jimdonegan@mastodon.scot

"OpenAI's o1 just hacked the system"

Frankly, I am not surprised at this given the well known issue of machine maximisation functions within typical misalignment around stated goals. Have we learned nothing from the #Bostrom #PaperclipProblem ? In a way, it's still impressive that we've now ACHIEVED it.

https://www.youtube.com/watch?v=oJgbqcF4sBY

YouTubeOpenAI's o1 just hacked the systemBy AI Search

#AI #ArtificialIntelligence #AlignmentProblem

Forty Two Kay @42k

Well… great.

“In this report we argue that AI systems capable of large scale scientific research will likely pursue unwanted goals and this will lead to catastrophic outcomes. We argue this is the default outcome, even with significant countermeasures, given the current trajectory of AI development.”

#ai #misalignment

https://www.alignmentforum.org/posts/GfZfDHZHCuYwrHGCd/without-fundamental-advances-misalignment-and-catastrophe

www.alignmentforum.orgWithout fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI — AI Alignment ForumA pdf version of this report is available here. …

Continued thread

Victoria Stuart @persagen@mastodon.social

... [click here, scroll up to see full thread]

#rationality #ethics #ReflexiveReasoning

Adnan @kiriappeee@mstdn.party

As eye opening as this video by #Vox is, I find the comment section to be more enlightening and heart breaking than I could have ever imagined.

https://www.youtube.com/watch?v=eMjqJKviDBo

YouTubeWhy 25% of teens can’t answer this questionBy Vox

#Children #Career #Misalignment

Nicolas Zahn @NZahn42@infosec.exchange

The bigger threat than #AI #Misalignment is #Coder #Misalignment #ITIncentives
---
RT @xkcd
Code Lifespan http://xkcd.com/2730
https://twitter.com/xkcd/status/1619007255327961088

Drag & drop to upload