Utility function security in artificially intelligent agents

The notion of 'wireheading', or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propo...

Full description

Saved in:

Bibliographic Details
Published in	Journal of experimental & theoretical artificial intelligence Vol. 26; no. 3; pp. 373 - 389
Main Author	Yampolskiy, Roman V.
Format	Journal Article
Language	English
Published	Abingdon Taylor & Francis 03.07.2014 Taylor & Francis Ltd
Subjects	Artificial intelligence Brain counterfeit utility Expert systems Integrity Intelligent agents Knowledge management literalness Neurosciences reward function Stimulation Utilities Utility functions wireheading
Online Access	Get full text
ISSN	0952-813X 1362-3079
DOI	10.1080/0952813X.2014.895114

Cover

More Information
Summary:	The notion of 'wireheading', or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propose a number of potential solutions to ensure the integrity of our artificial assistants. Overall, we conclude that wireheading in rational self-improving optimisers above a certain capacity remains an unsolved problem despite opinion of many that such machines will choose not to wirehead. A relevant issue of literalness in goal setting also remains largely unsolved and we suggest that the development of a non-ambiguous knowledge transfer language might be a step in the right direction.
Bibliography:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-2 content type line 23
ISSN:	0952-813X 1362-3079
DOI:	10.1080/0952813X.2014.895114