Utility function security in artificially intelligent agents
The notion of 'wireheading', or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propo...
Saved in:
Published in | Journal of experimental & theoretical artificial intelligence Vol. 26; no. 3; pp. 373 - 389 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Abingdon
Taylor & Francis
03.07.2014
Taylor & Francis Ltd |
Subjects | |
Online Access | Get full text |
ISSN | 0952-813X 1362-3079 |
DOI | 10.1080/0952813X.2014.895114 |
Cover
Summary: | The notion of 'wireheading', or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propose a number of potential solutions to ensure the integrity of our artificial assistants. Overall, we conclude that wireheading in rational self-improving optimisers above a certain capacity remains an unsolved problem despite opinion of many that such machines will choose not to wirehead. A relevant issue of literalness in goal setting also remains largely unsolved and we suggest that the development of a non-ambiguous knowledge transfer language might be a step in the right direction. |
---|---|
Bibliography: | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-2 content type line 23 |
ISSN: | 0952-813X 1362-3079 |
DOI: | 10.1080/0952813X.2014.895114 |