David J. Atkinson<p>New deceptions and <a href="https://c.im/tags/cheating" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>cheating</span></a> by <a href="https://c.im/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> reinforcement learning models: As predicted. </p><p>It is unclear whether “self-preservation” is a goal given to o1-preview or it emerged in the machine from learning. I suspect the latter because goal-pursuit is exactly what some <a href="https://c.im/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> does extremely well. Self-preservation is a necessary strategy. <a href="https://c.im/tags/Deception" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Deception</span></a> has emerged in competitive and multi-agent systems for over a decade (see e.g., Arkin, Wagner, Clark). </p><p>The combination of self-preservation and deception is new, surprising, and a big concern AI <a href="https://c.im/tags/safety" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>safety</span></a> and AI <a href="https://c.im/tags/ethics" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ethics</span></a> .<br> <br>“Of particular concern, Bengio says, is the emerging evidence of AI’s “self preservation” tendencies. To a goal-seeking agent, attempts to shut it down are just another obstacle to overcome. This was demonstrated in December, when researchers found that o1-preview, faced with deactivation, disabled oversight mechanisms and attempted—unsuccessfully—to copy itself to a new server. When confronted, the model played dumb, strategically lying to researchers to try to avoid being caught.”<br>h/t <span class="h-card" translate="no"><a href="https://beige.party/@chizeck" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>chizeck</span></a></span> <a href="https://beige.party/@chizeck/114042958654033766" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">beige.party/@chizeck/114042958</span><span class="invisible">654033766</span></a></p>