Wisdom DOES Imply Benevolence. Mark R. Waser. Super-Intelligence Ethics. (except in a very small number of low-probability edge cases). So . . . What’s the problem?. Superintelligence does not imply benevolence.
Mark R. Waser
(except in a very small number of low-probability edge cases)
So . . . What’s the problem?
Fox, J. & Shulman C. (2010) Superintelligence Does Not Imply Benevolence. In K. Mainzer (ed.), ECAP10: VIII European Conference on Computing and Philosophy (pp. 456-462) Munich: Verlag.
If machines become more intelligent than humans, will their intelligence lead them toward beneficial behavior toward humans even without specific efforts to design moral machines?
One might generalize from this trend and argue that as machines approach and exceed human cognitive capacities, moral behavior will improve in tandem.
can be far less important than
in determining benevolence vs. malevolence
intelligence – the ability to achieve goals in a wide range of environments.
If an intelligence has the single goal to *destroy humanity*,increased intelligence will only make it more malevolent
The primary danger of AIs is entirely due to the fact that their goal system *could* be different
An artificial intelligence with a cleanly hierarchical goal system with a single top-level (monomaniacal) goal of “Friendliness” (to humans)
Imagine a “Friendly AI” where Friendliness has been defined (hopefully accidentally) as *DESTROY HUMANITY*
The goal/motivation to achieve maximal goals in terms of number and diversity.
This picture neglects a critical distinction between
1. A system for cooperation
Advances one’s own ends
AIs will out-cooperate humans (Hall 2007)
2. A system to protect the weak/helpless
Demands revision of our ultimate ends
Will AIs revise their preferences to be more moral (Chalmers 2010)?
1. noticing direct instrumental motivations
Advances one’s own ends (transient)
2. noticing instrumental benefits to enduring benevolent dispositions/trustworthiness
Advances one’s own ends (permanent?)
3. causing an intrinsic desire for human welfare independent of instrumental concerns
Revision of ends/desires (maybe?)
If you have a verifiable history of being trustworthy when not forced, others do not have to commit resources to defending against you – and can pass some of those savings on to you
On the other hand, if you harm (or worse, destroy) interesting or useful entities, more powerful entities will likely decide that *you* need to spend resources as reparations and altruistic punishment (as well as paying the cost of enforcement)
Proceedings of the First AGI Conference, 2008
1. AIs will want to self-improve
2. AIs will want to be rational
3. AIs will try to preserve their utility
4. AIs will try to prevent counterfeit utility
5. AIs will be self-protective
6. AIs will want to acquire resources and use them efficiently
Any sufficiently advanced intelligence (i.e. one with even merely adequate foresight) is guaranteed to realize and take into account the fact that not asking for help and not being concerned about others will generally only work for a brief period of time before ‘the villagers start gathering pitchforks and torches.’
Everything is easier with help & without interference
In every system of morality, which I have hitherto met with, I have always remark'd, that the author proceeds for some time in the ordinary ways of reasoning, and establishes the being of a God, or makes observations concerning human affairs; when all of a sudden I am surpriz'd to find, that instead of the usual copulations of propositions, is, and is not, I meet with no proposition that is not connected with an ought, or an ought not. This change is imperceptible; but is however, of the last consequence. For as this ought, or ought not, expresses some new relation or affirmation, 'tis necessary that it shou'd be observ'd and explain'd; and at the same time that a reason should be given; for what seems altogether inconceivable, how this new relation can be a deduction from others, which are entirely different from it.
a superset of
interlocking sets of values, virtues, norms, practices, identities, institutions, technologies, and evolved psychological mechanisms
that work together to
suppress or regulate selfishness
make cooperative social life possible.
Haidt & Kesebir,
Handbook of Social Psychology, 5th Ed. 2010
Humean view – values are entirely independent of intelligence
Kantian view – many extremely intelligent beings would converge on (possibly benevolent) substantive normative principles upon reflection
EXAMPLE: Waste not, want not.
How would a super-intelligence behave if it knew that it had a goal but that it wouldn’t know that goal until sometime in the future?
Preserving that weak entity may be that goal
Or it might have necessary knowledge/skills