Utility function security in artificially intelligent agents

The notion of 'wireheading', or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propo...

Full description

Saved in:
Bibliographic Details
Published inJournal of experimental & theoretical artificial intelligence Vol. 26; no. 3; pp. 373 - 389
Main Author Yampolskiy, Roman V.
Format Journal Article
LanguageEnglish
Published Abingdon Taylor & Francis 03.07.2014
Taylor & Francis Ltd
Subjects
Online AccessGet full text
ISSN0952-813X
1362-3079
DOI10.1080/0952813X.2014.895114

Cover

Abstract The notion of 'wireheading', or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propose a number of potential solutions to ensure the integrity of our artificial assistants. Overall, we conclude that wireheading in rational self-improving optimisers above a certain capacity remains an unsolved problem despite opinion of many that such machines will choose not to wirehead. A relevant issue of literalness in goal setting also remains largely unsolved and we suggest that the development of a non-ambiguous knowledge transfer language might be a step in the right direction.
AbstractList The notion of 'wireheading', or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propose a number of potential solutions to ensure the integrity of our artificial assistants. Overall, we conclude that wireheading in rational self-improving optimisers above a certain capacity remains an unsolved problem despite opinion of many that such machines will choose not to wirehead. A relevant issue of literalness in goal setting also remains largely unsolved and we suggest that the development of a non-ambiguous knowledge transfer language might be a step in the right direction.
Author Yampolskiy, Roman V.
Author_xml – sequence: 1
  givenname: Roman V.
  surname: Yampolskiy
  fullname: Yampolskiy, Roman V.
  email: roman.yampolskiy@louisville.edu
  organization: Department of Computer Engineering and Computer Science, University of Louisville
BookMark eNqFkE9LBCEYhyU2aNv6Bh0GunSZTccZx4kgYukfLHRpoZu4joaLq5s6xH77lKnLHuqi-Pr83vflOQUT66wE4ALBOYIUXsOuqSjC7_MKonpOuwah-ghMESZViWHbTcA0I2VmTsBpCBsIIUrUFNyuojY67gs1WBG1s0WQYvC5om3BfdRKC82Nye8ojdEf0saC5zOcgWPFTZDnP_cMrB4f3hbP5fL16WVxvyxFjWkscdWKvsUUVRXperiWtMNr2hJck0ahViHBqcS1alBFBKUI9qrHDUFStkRW6WsGrsa-O-8-Bxki2-og0jLcSjcEhpqmI3WHKUzo5QG6cYO3abtE1bhtSZemz0A9UsK7ELxUbOf1lvs9Q5BlpexXKctK2ag0xW4OYkJHnq1Fz7X5L3w3hrVVzm_5l_OmZ5HvjfPKcyt0YPjPDt-ge5Aw
CitedBy_id crossref_primary_10_4018_IJGHPC_2017010104
crossref_primary_10_1016_j_bushor_2019_09_003
crossref_primary_10_1088_1402_4896_aa7ca8
crossref_primary_10_53765_20512201_30_9_154
crossref_primary_10_1007_s40264_022_01156_5
crossref_primary_10_1080_0952813X_2014_895110
crossref_primary_10_1080_0952813X_2021_1952653
crossref_primary_10_1088_1742_6596_1228_1_012025
crossref_primary_10_1007_s00146_018_0845_5
crossref_primary_10_1007_s42452_019_1003_5
crossref_primary_10_1142_S2705078522500072
crossref_primary_10_3390_philosophies5040040
Cites_doi 10.1145/191246.191322
10.1214/aos/1176343654
10.1109/MRA.2012.2201574
10.1176/ajp.120.6.571
10.3844/ajassp.2008.496.503
10.1007/978-3-642-22887-2_1
10.1016/S0004-3702(83)80005-8
10.1007/978-3-642-32560-1_7
10.1504/IJBM.2008.018665
10.1007/978-3-642-22887-2_29
10.1007/978-3-642-22887-2_2
10.1007/978-3-642-29694-9_1
10.1007/978-3-642-22887-2_48
10.1090/S0002-9947-1953-0053041-6
10.1257/jel.40.2.351
10.1007/978-3-663-02723-2
10.1109/ICAT.2011.6102123
10.1109/ICMLA.2012.16
10.1037/a0021526
10.1007/BF00486638
10.1109/ccnc08.2007.198
10.1007/978-3-642-22887-2_35
10.1111/1467-9213.00309
10.1037/h0058775
10.1038/scientificamerican0660-53
10.1117/12.773554
10.1007/s11023-009-9173-3
10.1007/BF01491891
10.1112/plms/s2-42.1.230
10.1038/467878a
10.1007/978-3-642-32560-1_6
ContentType Journal Article
Copyright 2014 Taylor & Francis 2014
Copyright Taylor & Francis Ltd. 2014
Copyright_xml – notice: 2014 Taylor & Francis 2014
– notice: Copyright Taylor & Francis Ltd. 2014
DBID AAYXX
CITATION
JQ2
7SC
8FD
F28
FR3
L7M
L~C
L~D
DOI 10.1080/0952813X.2014.895114
DatabaseName CrossRef
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
ProQuest Computer Science Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts
Engineering Research Database
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Computer and Information Systems Abstracts Professional
DatabaseTitleList ProQuest Computer Science Collection

Technology Research Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1362-3079
EndPage 389
ExternalDocumentID 3366917211
10_1080_0952813X_2014_895114
895114
Genre Article
Feature
GroupedDBID .4S
.7F
.DC
.QJ
0BK
0R~
29K
2DF
30N
4.4
5GY
5VS
8VB
AAENE
AAGDL
AAHIA
AAJMT
AALDU
AAMIU
AAPUL
AAQRR
ABCCY
ABDBF
ABFIM
ABHAV
ABIVO
ABJNI
ABLIJ
ABPAQ
ABPEM
ABTAI
ABXUL
ABXYU
ACGEJ
ACGFS
ACGOD
ACTIO
ACUHS
ADCVX
ADGTB
ADMLS
ADXPE
AEGXH
AEISY
AEMOZ
AENEX
AEOZL
AEPSL
AEYOC
AFKVX
AFRVT
AGDLA
AGMYJ
AHDZW
AHQJS
AIJEM
AIYEW
AJWEG
AKBVH
AKOOK
AKVCP
ALMA_UNASSIGNED_HOLDINGS
ALQZU
AQRUH
AQTUD
ARCSS
AVBZW
AWYRJ
BLEHA
CCCUG
CS3
D-I
DGEBU
DKSSO
EAP
EBR
EBS
EBU
ECS
EDO
EJD
EMK
EPL
EST
ESX
E~A
E~B
F5P
GTTXZ
H13
HF~
HZ~
H~P
I-F
IPNFZ
J.P
K1G
KYCEM
M4Z
MK~
NA5
NX~
O9-
P2P
PQQKQ
QWB
RIG
RNANH
ROSJB
RTWRZ
S-T
SNACF
TASJS
TBQAZ
TDBHL
TEN
TFL
TFT
TFW
TH9
TNC
TTHFI
TUROJ
TUS
TWF
UT5
UU3
ZGOLN
ZL0
~S~
AAYXX
CITATION
ADYSH
JQ2
7SC
8FD
F28
FR3
L7M
L~C
L~D
ID FETCH-LOGICAL-c438t-327cd73812269d0be893b8763465f17f1ca8e34f5126c8810dfd3561ee76e28e3
ISSN 0952-813X
IngestDate Fri Sep 05 12:26:17 EDT 2025
Fri Jul 25 07:22:06 EDT 2025
Wed Oct 01 03:54:36 EDT 2025
Thu Apr 24 22:53:55 EDT 2025
Mon Oct 20 23:42:40 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c438t-327cd73812269d0be893b8763465f17f1ca8e34f5126c8810dfd3561ee76e28e3
Notes SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
content type line 23
PQID 1543776989
PQPubID 53008
PageCount 17
ParticipantIDs crossref_primary_10_1080_0952813X_2014_895114
crossref_citationtrail_10_1080_0952813X_2014_895114
informaworld_taylorfrancis_310_1080_0952813X_2014_895114
proquest_journals_1543776989
proquest_miscellaneous_1559649380
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2014-07-03
PublicationDateYYYYMMDD 2014-07-03
PublicationDate_xml – month: 07
  year: 2014
  text: 2014-07-03
  day: 03
PublicationDecade 2010
PublicationPlace Abingdon
PublicationPlace_xml – name: Abingdon
PublicationTitle Journal of experimental & theoretical artificial intelligence
PublicationYear 2014
Publisher Taylor & Francis
Taylor & Francis Ltd
Publisher_xml – name: Taylor & Francis
– name: Taylor & Francis Ltd
References cit0033
cit0077
cit0034
cit0078
cit0075
cit0032
cit0076
cit0073
cit0030
cit0074
cit0071
cit0072
cit0070
cit0039
cit0037
cit0038
Goodhart C. (cit0031) 1975
cit0036
cit0022
cit0066
cit0067
cit0020
(cit0035) 1995
cit0064
cit0021
cit0065
cit0062
cit0060
cit0061
cit0028
cit0026
cit0027
cit0024
cit0068
cit0025
Neches R. (cit0041) 1991; 12
cit0069
cit0011
cit0055
cit0012
cit0056
cit0053
cit0010
cit0054
cit0051
cit0052
cit0050
cit0019
cit0015
Bostrom N. (cit0018) 2006; 5
cit0059
cit0016
cit0013
cit0057
cit0014
cit0058
Yampolskiy R. V. (cit0063) 2012; 19
cit0044
cit0001
cit0045
cit0042
cit0043
cit0040
Devito C. L. (cit0023) 1990; 43
Bostrom N. (cit0017) 2006; 5
cit0008
cit0009
cit0006
cit0007
cit0004
Goertzel B. (cit0029) 2003
cit0048
cit0049
cit0002
cit0046
cit0003
cit0047
References_xml – volume: 5
  start-page: 66
  year: 2006
  ident: cit0017
  publication-title: Review of Contemporary Philosophy
– ident: cit0038
– volume: 5
  start-page: 48
  year: 2006
  ident: cit0018
  publication-title: Linguistic and Philosophical Investigations
– ident: cit0009
– ident: cit0025
  doi: 10.1145/191246.191322
– ident: cit0034
– ident: cit0015
– ident: cit0044
– ident: cit0067
– ident: cit0011
  doi: 10.1214/aos/1176343654
– ident: cit0060
  doi: 10.1109/MRA.2012.2201574
– ident: cit0032
  doi: 10.1176/ajp.120.6.571
– ident: cit0054
– ident: cit0006
– ident: cit0021
– ident: cit0061
  doi: 10.3844/ajassp.2008.496.503
– ident: cit0058
– ident: cit0045
  doi: 10.1007/978-3-642-22887-2_1
– ident: cit0077
– ident: cit0014
– ident: cit0037
  doi: 10.1016/S0004-3702(83)80005-8
– ident: cit0062
– volume-title: Papers in Monetary Economics
  year: 1975
  ident: cit0031
– ident: cit0039
– ident: cit0069
  doi: 10.1007/978-3-642-32560-1_7
– ident: cit0070
  doi: 10.1504/IJBM.2008.018665
– ident: cit0010
– ident: cit0066
– ident: cit0051
  doi: 10.1007/978-3-642-22887-2_29
– volume: 19
  start-page: 194
  year: 2012
  ident: cit0063
  publication-title: Journal of Consciousness Studies
– ident: cit0003
– ident: cit0049
  doi: 10.1007/978-3-642-22887-2_2
– ident: cit0030
– ident: cit0072
– ident: cit0028
– ident: cit0064
  doi: 10.1007/978-3-642-29694-9_1
– ident: cit0007
– ident: cit0055
– ident: cit0076
  doi: 10.1007/978-3-642-22887-2_48
– ident: cit0059
– volume: 43
  start-page: 561
  year: 1990
  ident: cit0023
  publication-title: Journal of the British Interplanetary Society
– ident: cit0048
  doi: 10.1090/S0002-9947-1953-0053041-6
– ident: cit0013
– ident: cit0042
– ident: cit0026
  doi: 10.1257/jel.40.2.351
– ident: cit0046
– ident: cit0078
  doi: 10.1007/978-3-663-02723-2
– ident: cit0075
– ident: cit0027
– ident: cit0002
  doi: 10.1109/ICAT.2011.6102123
– ident: cit0073
  doi: 10.1109/ICMLA.2012.16
– ident: cit0036
  doi: 10.1037/a0021526
– ident: cit0004
– ident: cit0022
  doi: 10.1007/BF00486638
– ident: cit0065
  doi: 10.1109/ccnc08.2007.198
– ident: cit0056
– ident: cit0024
  doi: 10.1007/978-3-642-22887-2_35
– ident: cit0016
  doi: 10.1111/1467-9213.00309
– ident: cit0043
  doi: 10.1037/h0058775
– ident: cit0020
  doi: 10.1038/scientificamerican0660-53
– volume-title: The New York Times
  year: 1995
  ident: cit0035
– ident: cit0071
  doi: 10.1117/12.773554
– ident: cit0008
– ident: cit0012
  doi: 10.1007/s11023-009-9173-3
– ident: cit0033
– ident: cit0050
  doi: 10.1007/BF01491891
– ident: cit0053
  doi: 10.1112/plms/s2-42.1.230
– ident: cit0068
– volume: 12
  start-page: 37
  year: 1991
  ident: cit0041
  publication-title: AI Magazine
– ident: cit0052
  doi: 10.1038/467878a
– ident: cit0040
  doi: 10.1007/978-3-642-32560-1_6
– ident: cit0001
– ident: cit0074
– ident: cit0047
– ident: cit0019
– year: 2003
  ident: cit0029
  publication-title: Dynamical Psychology
– ident: cit0057
SSID ssj0001511
Score 2.1891308
Snippet The notion of 'wireheading', or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the...
SourceID proquest
crossref
informaworld
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 373
SubjectTerms Artificial intelligence
Brain
counterfeit utility
Expert systems
Integrity
Intelligent agents
Knowledge management
literalness
Neurosciences
reward function
Stimulation
Utilities
Utility functions
wireheading
Title Utility function security in artificially intelligent agents
URI https://www.tandfonline.com/doi/abs/10.1080/0952813X.2014.895114
https://www.proquest.com/docview/1543776989
https://www.proquest.com/docview/1559649380
Volume 26
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: Academic Search Ultimate - eBooks
  customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn
  eissn: 1362-3079
  dateEnd: 20241028
  omitProxy: true
  ssIdentifier: ssj0001511
  issn: 0952-813X
  databaseCode: ABDBF
  dateStart: 19980701
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn
  providerName: EBSCOhost
– providerCode: PRVEBS
  databaseName: Business Source Ultimate
  customDbUrl:
  eissn: 1362-3079
  dateEnd: 20241028
  omitProxy: false
  ssIdentifier: ssj0001511
  issn: 0952-813X
  databaseCode: AKVCP
  dateStart: 19980701
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=ehost&defaultdb=bsu
  providerName: EBSCOhost
– providerCode: PRVEBS
  databaseName: Inspec with Full Text
  customDbUrl:
  eissn: 1362-3079
  dateEnd: 20241028
  omitProxy: false
  ssIdentifier: ssj0001511
  issn: 0952-813X
  databaseCode: ADMLS
  dateStart: 19890101
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text
  providerName: EBSCOhost
– providerCode: PRVLSH
  databaseName: aylor and Francis Online
  customDbUrl:
  mediaType: online
  eissn: 1362-3079
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001511
  issn: 0952-813X
  databaseCode: AHDZW
  dateStart: 19960101
  isFulltext: true
  providerName: Library Specific Holdings
– providerCode: PRVAWR
  databaseName: Taylor & Francis Science and Technology Library-DRAA
  customDbUrl:
  eissn: 1362-3079
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001511
  issn: 0952-813X
  databaseCode: 30N
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://www.tandfonline.com/page/title-lists
  providerName: Taylor & Francis
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Bb9MwFLZGd-ECYwNRGMiTuKFUcZ_jJNIu0wBNiO1CO22nKI4dVFHSiWYH-PV7L3bSlFYDdomq2Imr9748f3H8vsfYOxVZsEqaACg9VwphglRrHZAyobGRCaOIspHPL9TZVH6-iq5W1Uab7JJaj4rfW_NKHuJVPId-pSzZ__Bsd1M8gb_Rv3hED-Pxn3w8rWcNi6bJqfHj0lejo1UM6u_kIea_GlUIp7xZv8-_tfpNW1jpmuI_waKf6bi6Ze-GK2hc5z9uFvPl95lz3II-D1yO-usKQjZ7UKFDwmSjxMfa2iGGUdEU8sVJxEVOSr_CgJH2Q6tLhvcQgl6cBFe_xE-54KoIbURzv_0RR6PBaB-eHCVICV3a6R862a7hEdsdY3wPB2z35MP5l6_drIzMRjjdRffX2zRK0lnfMsAaTVkTsd2YtBsmMtljT7yz-InDwzO2Y6t99rQtz8F9tD5gxx4evIUHb-HBZxXvw4P34MEdPJ6z6aePk9OzwBfLCAoJSR3AOC5MjPwL-XRqQm2RiGqSG5QqKkVciiJPLMgSCZ4qkkSEpjSA5NnaWNkxNr1gg2pR2ZeMUy0WyMtS6AQk6DBXUCqtjMhjNE6uhwxa22SFV5KngibzTLSCs96iGVk0cxYdsqC76sYpqfylf9I3e1Y3gCwdFjO4_9LD1kWZf1yXGb4rQBxTvdQhO-qaMZjSF7K8sotb6hOlSqaQhK8ePvpr9nj1PB2yQf3z1r5B5lrrtx6Ud6MllF0
linkProvider EBSCOhost
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3NS8MwFA8yD3pxfuJ0agWvnc1emmbgRUSZuu3kwFtomgTE0YnrDvrXm9c2ww9U0EuhpGmbj_fer-nL70fICY8NGM50CLg9l1Gqw55SKkRmQm1iHcUx7kYejnh_zG7uY59NOKvTKvEb2lZEEaWvRuPGxWifEnfqYEFXULjHzCzWEQ4koJT1cuywPooYQDRaOGMX0GhFt-cM31Xxu-e-ucuH6PSBu_SLry4D0FWTKP_qVd7JY2deqE72-onV8V9tWydrNTwNzqv5tEGWTL5Jml76Iag9wRY5GxeYVPsSYFzEsQ1mtRBe8JAHOB0rZooJnnvSzyJI8TjbJuOry7uLflgLMYQZA1GE0E0ynbjY7rBaT0fKOJCjkMqO8djSxNIsFQaYdeCBZ0LQSFsNDpgZk3DTdUU7pJFPc7NLAtT5gNRaqgQwUFHKwXLFNU0T19ZUtQj4AZBZzVKOYhkTST2Zad1BEjtIVh3UIuGi1lPF0vHL9eL92MqiXB2xlZSJhJ-rtv08kLW5z6TDoZAkqMXZIseLYmeo-Pclzc10jtfEPc56IKK9vz_9iKz074YDObge3e6TVSwpc4ehTRrF89wcOIRUqMPSBt4A5d8B8Q
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3dS8MwEA-iIL44P3E6tYKvnc2Sphn4IuqYX8MHB3sLTZOAOLrhugf96801zdgUFfSlUJK0zTV392tz9zuETlmsiWZUhQTScynGKmxLKUNgJlQ6VlEcQzbyQ491-_R2EA_msvghrBK-oY0jiihtNSj3WBkfEXdmUUGLYzKAwCza5BYjQCXrFQabYpDEEfVmttj6M-zY9qze2yE-ee6bqyw4pwXq0i-muvQ_nRpK_ZO7sJOX5rSQzez9E6njf6a2gdYrcBpcuNW0iZZ0voVqvvBDUNmBbXTeLyCk9i0ArwhvNphUZfCC5zyAxeh4KYZw7ik_iyCF42QH9TvXT5fdsCrDEGaU8CIkrSRTifXsFqm1VSS1hTgSiOwoiw1ODM5Srgk1FjqwjHMcKaOIhWVaJ0y3bNMuWs5Hud5DAVT5IKkxWHJCiYxSRgyTTOE0sXNNZR0RL3-RVRzlUCpjKLCnMq0EJEBAwgmojsLZqLHj6PilP59_taIo_40YV8hEkJ-HNvwyEJWyT4RFoSRJoBJnHZ3Mmq2awt5LmuvRFPrEbUbbhEf7f7_7MVp9vOqI-5ve3QFag4YycJg00HLxOtWHFh4V8qjUgA-suQCV
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Utility+function+security+in+artificially+intelligent+agents&rft.jtitle=Journal+of+experimental+%26+theoretical+artificial+intelligence&rft.au=Yampolskiy%2C+Roman+V.&rft.date=2014-07-03&rft.pub=Taylor+%26+Francis&rft.issn=0952-813X&rft.eissn=1362-3079&rft.volume=26&rft.issue=3&rft.spage=373&rft.epage=389&rft_id=info:doi/10.1080%2F0952813X.2014.895114&rft.externalDocID=895114
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0952-813X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0952-813X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0952-813X&client=summon