1 / 28

Designing a Safe Motivational System for Intelligent Machines

Designing a Safe Motivational System for Intelligent Machines. Mark R. Waser. Inflammatory Statements. >Human intelligence REQUIRES ethics All humans want the same things Ethics are universal Ethics are SIMPLE in concept Difference in power is irrelevant (to ethics)

stacey
Download Presentation

Designing a Safe Motivational System for Intelligent Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing aSafe Motivational Systemfor Intelligent Machines Mark R. Waser

  2. Inflammatory Statements • >Human intelligence REQUIRES ethics • All humans want the same things • Ethics are universal • Ethics are SIMPLE in concept • Difference in power is irrelevant (to ethics) • Evolution has “designed” you todisagree with the above five points

  3. (disguised assumptions) Definitions • Human – goal-directed entity • Goals – a destination OR a direction • Restrictions – conditional overriding goals • Motivation – incentive to move • Actions – determined by goals + motivations • Path (or direction) • Preferences, Rules-of-Thumb and Defaults • Ethics (the *goal* includes the path) • Safety

  4. Asimov's 3 Laws: 1. A robot may not injure a human being or, through inaction, allow a human being to come to harm. 2. A robot must obey orders given to it by human beings except where such orders would conflict with the First Law. 3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law. http://www.markzug.com/

  5. Four Possible Scenarios • Asimov’s early robots (little foresight, helpful but easily confused or conflicted) • Immediate shutdown/suicide • VIKI from the movie “I, Robot” (generalize to “bubble-wrapping” humanity) • Asimov’s late robots (further generalize to self-exile with invisible continuing assistance)

  6. Friendly AI - an AI that takes actions that are, on the whole, beneficial to humans and humanity; benevolent rather than malevolent; nice rather than hostile Coherent Extrapolated Volition of Humanity (CEV) - “In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together.” SIAI’s Definitions goals & motivations ------------------

  7. SIAI’s First Law An AI must be beneficial to humans and humanity (benevolent rather than malevolent) But . . . What is beneficial? What are humans and humanity?

  8. Values (good/bad) are *entirely* derivative/relative with respect to some goal (CEV) Value = f(x, y) where x is a set of circumstances (world state), y is a set of (proposed) actions, and f is an evaluation of how well your goal is advanced Value = f(x, y, t, e) t is the time point at which goal progress is judged e is the set of entities which the goal covers Value Formula

  9. Questions • Is this moral relativism? • Are values complex? • Must our goal (CEV) be complex?

  10. Copernicus!

  11. Mandelbrot set Assume that beneficial was a relatively simple formula (like z2+c)

  12. Color Illusions Assume further that we are trying to determine that formula (beneficial) by looking at the results (color) one example (pixel) at a time

  13. Current Situation of Ethics • Two formulas (beneficial to humans and humanity & beneficial to me) • As long as you aren’t caught, all the incentive is to shade towards the second • Evolution has “designed” humans to be able to shade to the second (Trivers, Hauser) • Further, for very intelligent people, it is far more advantageous for ethics to be complex

  14. Definition Ethics *IS* What is beneficial for the community OR What maximizes cooperation

  15. Goal(s)/Omohundro Drives 1. AIs will want to self-improve 2. AIs will want to be rational 3. AIs will try to preserve their utility 4. AIs will try to prevent counterfeit utility 5. AIs will be self-protective 6. AIs will want to acquire resources and use them efficiently

  16. GDEs ----- “Without explicit goals to the contrary, AIs are likely to behave like human sociopathsin their pursuit of resources.” GDEs will want cooperation and to be part of a community GDEs will want FREEDOM!

  17. Humans . . . • Are classified as obligatorily gregarious because we come from a long lineage for which life in groups is not an option but a survival strategy (Frans de Waal, 2006) • Evolved to be extremely social because mass cooperation, in the form of community, is the best way to survive and thrive • Have empathy not only because it helps to understand and predict the actions of others but, more importantly, prevents us from doing anti-social things that will inevitably hurt us in the long run (although we generally won’t believe this) • Have not yet evolved a far-sighted rationality where the “rational” conscious mind is capable of competently making the correct social/community choices when deprived of our subconscious “sense of morality”

  18. Circles of Morality /Moral Sombrero Relationships and Loyalty

  19. Redefining Friendly Entity • Friendly Entity (“Friendly”) - an entity with goals and motivations that are, on the whole, beneficial to humans and humanity; benevolent rather than malevolent • Friendly Entity (“Friendly”) - an entity with goals and motivations that are, on the whole, beneficial to the community of Friendlies (i.e. the set of all Friendlies, known or unknown); benevolent rather than malevolent

  20. Friendliness’s First Law An entity must be beneficial to the community of Friendlies (benevolent rather than malevolent) But . . . What is beneficial? What are humans and humanity? ------------ --------------------

  21. What is beneficial? • Cooperation (minimize conflicts & frictions) • Omohundro drives • Increasing the size of the community (both growing and preventing defection) • To meet the needs/goals of each member of the community better than any alternative (as judged by them -- without interference or gaming)

  22. What is harmful? • Blocking/Perverting Omohundro Drives • Lying • Single-goaled entities • Over-optimization (achievable top level goals) • The fact that we do not maintain our top-level goal and have not yet evolved a far-sighted rationality where the “rational” conscious mind is capable of competently making the correct social/community choices when deprived of our “moral sense”

  23. OPTIMAL community’s sense of what is correct (ethical) < This makes ethics much more complex because it includes the cultural history The anti-gaming drive to maintain utility adds friction/resistance to the discussion of ethics

  24. ONE non-organ donor > + avoiding a defensive arms race SIX dying patients Credit to: Eric Baum What Is Thought?

  25. CEV Triangle LOGICAL VIEW GOAL(S) stimuli implement moral rules of thumb ACTIONS

  26. Sloman’s architecturefor ahuman-like agent(Sloman 1999)

  27. Inflammatory Statements • >Human intelligence REQUIRES ethics • All humans want the same things • Ethics are universal • Ethics are SIMPLE in concept • Difference in power is irrelevant (to ethics) • Evolution has “designed” you todisagree with the above five points

  28. CEV Candidate #1: We wish that all entities were Friendlies Necessary? Sufficient/Complete? Possible? Next . . . . Copies of this powerpoint available from MWaser@BooksIntl.com

More Related