Utility function security in artificially intelligent agents
The notion of 'wireheading', or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propo...
Saved in:
| Published in | Journal of experimental & theoretical artificial intelligence Vol. 26; no. 3; pp. 373 - 389 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | English |
| Published |
Abingdon
Taylor & Francis
03.07.2014
Taylor & Francis Ltd |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0952-813X 1362-3079 |
| DOI | 10.1080/0952813X.2014.895114 |
Cover
| Summary: | The notion of 'wireheading', or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propose a number of potential solutions to ensure the integrity of our artificial assistants. Overall, we conclude that wireheading in rational self-improving optimisers above a certain capacity remains an unsolved problem despite opinion of many that such machines will choose not to wirehead. A relevant issue of literalness in goal setting also remains largely unsolved and we suggest that the development of a non-ambiguous knowledge transfer language might be a step in the right direction. |
|---|---|
| Bibliography: | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-2 content type line 23 |
| ISSN: | 0952-813X 1362-3079 |
| DOI: | 10.1080/0952813X.2014.895114 |