Editing Eliezer Yudkowsky (section)

==Work in artificial intelligence safety==
{{See also|Machine Intelligence Research Institute}}

===Goal learning and incentives in software systems===
Yudkowsky's views on the safety challenges future generations of AI systems pose are discussed in [[Stuart J. Russell|Stuart Russell]]'s and [[Peter Norvig]]'s undergraduate textbook ''[[Artificial Intelligence: A Modern Approach]]''. Noting the difficulty of formally specifying general-purpose goals by hand, Russell and Norvig cite Yudkowsky's proposal that autonomous and adaptive systems be designed to learn correct behavior over time:

{{quote|text=Yudkowsky (2008)<ref name="gcr">{{cite book |last=Yudkowsky |first=Eliezer |date=2008 |chapter=Artificial Intelligence as a Positive and Negative Factor in Global Risk |chapter-url=https://intelligence.org/files/AIPosNegFactor.pdf |editor1-last=Bostrom |editor1-first=Nick |editor1-link=Nick Bostrom |editor2-last=Ćirković |editor2-first=Milan |title=Global Catastrophic Risks |publisher=Oxford University Press |isbn=978-0199606504 |access-date=October 16, 2015 |archive-date=March 2, 2013 |archive-url=https://web.archive.org/web/20130302173022/http://intelligence.org/files/AIPosNegFactor.pdf |url-status=live }}</ref> goes into more detail about how to design a ''Friendly AI''. He asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that the robot will learn and evolve over time. Thus the challenge is one of mechanism design—to design a mechanism for evolving AI under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes.<ref name="aima"/>}}

In response to the [[instrumental convergence]] concern, that autonomous decision-making systems with poorly designed goals would have default incentives to mistreat humans, Yudkowsky and other MIRI researchers have recommended that work be done to specify software agents that converge on safe default behaviors even when their goals are misspecified.<ref name="corrigibility">{{cite conference |url=http://aaai.org/ocs/index.php/WS/AAAIW15/paper/view/10124/10136 |title=Corrigibility |last1=Soares |first1=Nate |last2=Fallenstein |first2=Benja |last3=Yudkowsky |first3=Eliezer |author-link3=Eliezer Yudkowsky |date=2015 |publisher=AAAI Publications |book-title=AAAI Workshops: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, January 25–26, 2015 |conference= |access-date=October 16, 2015 |archive-date=January 15, 2016 |archive-url=https://web.archive.org/web/20160115113546/http://aaai.org/ocs/index.php/WS/AAAIW15/paper/view/10124/10136 |url-status=live }}</ref><ref name="auto1"/>

===Capabilities forecasting===
In the [[intelligence explosion]] scenario hypothesized by [[I. J. Good]], recursively self-improving AI systems quickly transition from subhuman general intelligence to [[superintelligence|superintelligent]]. [[Nick Bostrom]]'s 2014 book ''[[Superintelligence: Paths, Dangers, Strategies]]'' sketches out Good's argument in detail, while citing Yudkowsky on the risk that [[Anthropomorphism|anthropomorphizing]] advanced AI systems will cause people to misunderstand the nature of an intelligence explosion. "AI might make an ''apparently'' sharp jump in intelligence purely as the result of anthropomorphism, the human tendency to think of 'village idiot' and 'Einstein' as the extreme ends of the intelligence scale, instead of nearly indistinguishable points on the scale of minds-in-general."<ref name="aima"/><ref name="gcr"/><ref>{{cite book|last1=Bostrom|first1=Nick|author-link=Nick Bostrom|title=Superintelligence: Paths, Dangers, Strategies|date=2014|isbn=978-0199678112|title-link=Superintelligence: Paths, Dangers, Strategies|publisher=Oxford University Press }}</ref>

In ''Artificial Intelligence: A Modern Approach'', Russell and Norvig raise the objection that there are known limits to intelligent problem-solving from [[computational complexity theory]]; if there are strong limits on how efficiently algorithms can solve various tasks, an intelligence explosion may not be possible.<ref name="aima"/>

=== ''Time'' op-ed ===
In a 2023 op-ed for [[Time (magazine)|''Time'' magazine]], Yudkowsky discussed the risk of artificial intelligence and advocated for international agreements to limit it, including a total halt on the development of AI.<ref>{{Cite news |last=Moss |first=Sebastian |date=2023-03-30 |title="Be willing to destroy a rogue data center by airstrike" - leading AI alignment researcher pens Time piece calling for ban on large GPU clusters |work=Data Center Dynamics |url=https://www.datacenterdynamics.com/en/news/be-willing-to-destroy-a-rogue-data-center-by-airstrike-leading-ai-alignment-researcher-pens-time-piece-calling-for-ban-on-large-gpu-clusters/ |access-date=2023-04-17 |archive-date=April 17, 2023 |archive-url=https://web.archive.org/web/20230417223624/https://www.datacenterdynamics.com/en/news/be-willing-to-destroy-a-rogue-data-center-by-airstrike-leading-ai-alignment-researcher-pens-time-piece-calling-for-ban-on-large-gpu-clusters/ |url-status=live }}</ref><ref>{{Cite news |last=Ferguson |first=Niall |author-link=Niall Ferguson |date=2023-04-09 |title=The Aliens Have Landed, and We Created Them |work=[[Bloomberg News|Bloomberg]] |url=https://www.bloomberg.com/opinion/articles/2023-04-09/artificial-intelligence-the-aliens-have-landed-and-we-created-them |access-date=2023-04-17 |archive-date=April 9, 2023 |archive-url=https://web.archive.org/web/20230409160604/https://www.bloomberg.com/opinion/articles/2023-04-09/artificial-intelligence-the-aliens-have-landed-and-we-created-them |url-status=live }}</ref> He suggested that participating countries should be willing to take military action, such as "destroy[ing] a rogue datacenter by airstrike", to enforce such a moratorium.<ref name=":1">{{Cite magazine |last=Hutson |first=Matthew |date=2023-05-16 |title=Can We Stop Runaway A.I.? |language=en-US |magazine=The New Yorker |url=https://www.newyorker.com/science/annals-of-artificial-intelligence/can-we-stop-the-singularity |access-date=2023-05-19 |issn=0028-792X |quote=Eliezer Yudkowsky, a researcher at the Machine Intelligence Research Institute, in the Bay Area, has likened A.I.-safety recommendations to a fire-alarm system. A classic experiment found that, when smoky mist began filling a room containing multiple people, most didn't report it. They saw others remaining stoic and downplayed the danger. An official alarm may signal that it's legitimate to take action. But, in A.I., there's no one with the clear authority to sound such an alarm, and people will always disagree about which advances count as evidence of a conflagration. "There will be no fire alarm that is not an actual running AGI," Yudkowsky has written. Even if everyone agrees on the threat, no company or country will want to pause on its own, for fear of being passed by competitors. ... That may require quitting A.I. cold turkey before we feel it's time to stop, rather than getting closer and closer to the edge, tempting fate. But shutting it all down would call for draconian measures—perhaps even steps as extreme as those espoused by Yudkowsky, who recently wrote, in an editorial for ''Time'', that we should "be willing to destroy a rogue datacenter by airstrike," even at the risk of sparking "a full nuclear exchange." |archive-date=May 19, 2023 |archive-url=https://web.archive.org/web/20230519014111/https://www.newyorker.com/science/annals-of-artificial-intelligence/can-we-stop-the-singularity |url-status=live }}</ref> The article helped introduce the debate about [[AI alignment]] to the mainstream, leading a reporter to ask President [[Joe Biden]] a question about AI safety at a press briefing.<ref name=":0" />

=== ''If Anyone Builds It, Everyone Dies'' ===
Together with [[Nate Soares]], Yudkowsky wrote ''If Anyone Builds It, Everyone Dies'', which is being published by [[Little, Brown and Company]] on September 16, 2025.<ref>{{Cite book |last=Yudkowsky |first=Eliezer |author-link=Eliezer Yudkowsky |url=https://www.hachettebookgroup.com/titles/eliezer-yudkowsky/if-anyone-builds-it-everyone-dies/9780316595643/ |title=If Anyone Builds It, Everyone Dies |last2=Soares |first2=Nate |author-link2=Nate Soares |publisher=[[Little, Brown and Company]] |isbn=978-0-316-59564-3 |publication-date=2025-09-16 |language=en-US}}</ref>