Utility function security in artificially intelligent agents
The notion of 'wireheading', or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propo...
Saved in:
| Published in | Journal of experimental & theoretical artificial intelligence Vol. 26; no. 3; pp. 373 - 389 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | English |
| Published |
Abingdon
Taylor & Francis
03.07.2014
Taylor & Francis Ltd |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0952-813X 1362-3079 |
| DOI | 10.1080/0952813X.2014.895114 |
Cover
| Abstract | The notion of 'wireheading', or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propose a number of potential solutions to ensure the integrity of our artificial assistants. Overall, we conclude that wireheading in rational self-improving optimisers above a certain capacity remains an unsolved problem despite opinion of many that such machines will choose not to wirehead. A relevant issue of literalness in goal setting also remains largely unsolved and we suggest that the development of a non-ambiguous knowledge transfer language might be a step in the right direction. |
|---|---|
| AbstractList | The notion of 'wireheading', or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propose a number of potential solutions to ensure the integrity of our artificial assistants. Overall, we conclude that wireheading in rational self-improving optimisers above a certain capacity remains an unsolved problem despite opinion of many that such machines will choose not to wirehead. A relevant issue of literalness in goal setting also remains largely unsolved and we suggest that the development of a non-ambiguous knowledge transfer language might be a step in the right direction. |
| Author | Yampolskiy, Roman V. |
| Author_xml | – sequence: 1 givenname: Roman V. surname: Yampolskiy fullname: Yampolskiy, Roman V. email: roman.yampolskiy@louisville.edu organization: Department of Computer Engineering and Computer Science, University of Louisville |
| BookMark | eNqFkE9LBCEYhyU2aNv6Bh0GunSZTccZx4kgYukfLHRpoZu4joaLq5s6xH77lKnLHuqi-Pr83vflOQUT66wE4ALBOYIUXsOuqSjC7_MKonpOuwah-ghMESZViWHbTcA0I2VmTsBpCBsIIUrUFNyuojY67gs1WBG1s0WQYvC5om3BfdRKC82Nye8ojdEf0saC5zOcgWPFTZDnP_cMrB4f3hbP5fL16WVxvyxFjWkscdWKvsUUVRXperiWtMNr2hJck0ahViHBqcS1alBFBKUI9qrHDUFStkRW6WsGrsa-O-8-Bxki2-og0jLcSjcEhpqmI3WHKUzo5QG6cYO3abtE1bhtSZemz0A9UsK7ELxUbOf1lvs9Q5BlpexXKctK2ag0xW4OYkJHnq1Fz7X5L3w3hrVVzm_5l_OmZ5HvjfPKcyt0YPjPDt-ge5Aw |
| CitedBy_id | crossref_primary_10_4018_IJGHPC_2017010104 crossref_primary_10_1016_j_bushor_2019_09_003 crossref_primary_10_1088_1402_4896_aa7ca8 crossref_primary_10_53765_20512201_30_9_154 crossref_primary_10_1007_s40264_022_01156_5 crossref_primary_10_1080_0952813X_2014_895110 crossref_primary_10_1080_0952813X_2021_1952653 crossref_primary_10_1088_1742_6596_1228_1_012025 crossref_primary_10_1007_s00146_018_0845_5 crossref_primary_10_1007_s42452_019_1003_5 crossref_primary_10_1142_S2705078522500072 crossref_primary_10_3390_philosophies5040040 |
| Cites_doi | 10.1145/191246.191322 10.1214/aos/1176343654 10.1109/MRA.2012.2201574 10.1176/ajp.120.6.571 10.3844/ajassp.2008.496.503 10.1007/978-3-642-22887-2_1 10.1016/S0004-3702(83)80005-8 10.1007/978-3-642-32560-1_7 10.1504/IJBM.2008.018665 10.1007/978-3-642-22887-2_29 10.1007/978-3-642-22887-2_2 10.1007/978-3-642-29694-9_1 10.1007/978-3-642-22887-2_48 10.1090/S0002-9947-1953-0053041-6 10.1257/jel.40.2.351 10.1007/978-3-663-02723-2 10.1109/ICAT.2011.6102123 10.1109/ICMLA.2012.16 10.1037/a0021526 10.1007/BF00486638 10.1109/ccnc08.2007.198 10.1007/978-3-642-22887-2_35 10.1111/1467-9213.00309 10.1037/h0058775 10.1038/scientificamerican0660-53 10.1117/12.773554 10.1007/s11023-009-9173-3 10.1007/BF01491891 10.1112/plms/s2-42.1.230 10.1038/467878a 10.1007/978-3-642-32560-1_6 |
| ContentType | Journal Article |
| Copyright | 2014 Taylor & Francis 2014 Copyright Taylor & Francis Ltd. 2014 |
| Copyright_xml | – notice: 2014 Taylor & Francis 2014 – notice: Copyright Taylor & Francis Ltd. 2014 |
| DBID | AAYXX CITATION JQ2 7SC 8FD F28 FR3 L7M L~C L~D |
| DOI | 10.1080/0952813X.2014.895114 |
| DatabaseName | CrossRef ProQuest Computer Science Collection Computer and Information Systems Abstracts Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef ProQuest Computer Science Collection Technology Research Database Computer and Information Systems Abstracts – Academic Computer and Information Systems Abstracts Engineering Research Database Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | ProQuest Computer Science Collection Technology Research Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1362-3079 |
| EndPage | 389 |
| ExternalDocumentID | 3366917211 10_1080_0952813X_2014_895114 895114 |
| Genre | Article Feature |
| GroupedDBID | .4S .7F .DC .QJ 0BK 0R~ 29K 2DF 30N 4.4 5GY 5VS 8VB AAENE AAGDL AAHIA AAJMT AALDU AAMIU AAPUL AAQRR ABCCY ABDBF ABFIM ABHAV ABIVO ABJNI ABLIJ ABPAQ ABPEM ABTAI ABXUL ABXYU ACGEJ ACGFS ACGOD ACTIO ACUHS ADCVX ADGTB ADMLS ADXPE AEGXH AEISY AEMOZ AENEX AEOZL AEPSL AEYOC AFKVX AFRVT AGDLA AGMYJ AHDZW AHQJS AIJEM AIYEW AJWEG AKBVH AKOOK AKVCP ALMA_UNASSIGNED_HOLDINGS ALQZU AQRUH AQTUD ARCSS AVBZW AWYRJ BLEHA CCCUG CS3 D-I DGEBU DKSSO EAP EBR EBS EBU ECS EDO EJD EMK EPL EST ESX E~A E~B F5P GTTXZ H13 HF~ HZ~ H~P I-F IPNFZ J.P K1G KYCEM M4Z MK~ NA5 NX~ O9- P2P PQQKQ QWB RIG RNANH ROSJB RTWRZ S-T SNACF TASJS TBQAZ TDBHL TEN TFL TFT TFW TH9 TNC TTHFI TUROJ TUS TWF UT5 UU3 ZGOLN ZL0 ~S~ AAYXX CITATION ADYSH JQ2 7SC 8FD F28 FR3 L7M L~C L~D |
| ID | FETCH-LOGICAL-c438t-327cd73812269d0be893b8763465f17f1ca8e34f5126c8810dfd3561ee76e28e3 |
| ISSN | 0952-813X |
| IngestDate | Fri Sep 05 12:26:17 EDT 2025 Fri Jul 25 07:22:06 EDT 2025 Wed Oct 01 03:54:36 EDT 2025 Thu Apr 24 22:53:55 EDT 2025 Mon Oct 20 23:42:40 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 3 |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c438t-327cd73812269d0be893b8763465f17f1ca8e34f5126c8810dfd3561ee76e28e3 |
| Notes | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-2 content type line 23 |
| PQID | 1543776989 |
| PQPubID | 53008 |
| PageCount | 17 |
| ParticipantIDs | crossref_primary_10_1080_0952813X_2014_895114 crossref_citationtrail_10_1080_0952813X_2014_895114 informaworld_taylorfrancis_310_1080_0952813X_2014_895114 proquest_journals_1543776989 proquest_miscellaneous_1559649380 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2014-07-03 |
| PublicationDateYYYYMMDD | 2014-07-03 |
| PublicationDate_xml | – month: 07 year: 2014 text: 2014-07-03 day: 03 |
| PublicationDecade | 2010 |
| PublicationPlace | Abingdon |
| PublicationPlace_xml | – name: Abingdon |
| PublicationTitle | Journal of experimental & theoretical artificial intelligence |
| PublicationYear | 2014 |
| Publisher | Taylor & Francis Taylor & Francis Ltd |
| Publisher_xml | – name: Taylor & Francis – name: Taylor & Francis Ltd |
| References | cit0033 cit0077 cit0034 cit0078 cit0075 cit0032 cit0076 cit0073 cit0030 cit0074 cit0071 cit0072 cit0070 cit0039 cit0037 cit0038 Goodhart C. (cit0031) 1975 cit0036 cit0022 cit0066 cit0067 cit0020 (cit0035) 1995 cit0064 cit0021 cit0065 cit0062 cit0060 cit0061 cit0028 cit0026 cit0027 cit0024 cit0068 cit0025 Neches R. (cit0041) 1991; 12 cit0069 cit0011 cit0055 cit0012 cit0056 cit0053 cit0010 cit0054 cit0051 cit0052 cit0050 cit0019 cit0015 Bostrom N. (cit0018) 2006; 5 cit0059 cit0016 cit0013 cit0057 cit0014 cit0058 Yampolskiy R. V. (cit0063) 2012; 19 cit0044 cit0001 cit0045 cit0042 cit0043 cit0040 Devito C. L. (cit0023) 1990; 43 Bostrom N. (cit0017) 2006; 5 cit0008 cit0009 cit0006 cit0007 cit0004 Goertzel B. (cit0029) 2003 cit0048 cit0049 cit0002 cit0046 cit0003 cit0047 |
| References_xml | – volume: 5 start-page: 66 year: 2006 ident: cit0017 publication-title: Review of Contemporary Philosophy – ident: cit0038 – volume: 5 start-page: 48 year: 2006 ident: cit0018 publication-title: Linguistic and Philosophical Investigations – ident: cit0009 – ident: cit0025 doi: 10.1145/191246.191322 – ident: cit0034 – ident: cit0015 – ident: cit0044 – ident: cit0067 – ident: cit0011 doi: 10.1214/aos/1176343654 – ident: cit0060 doi: 10.1109/MRA.2012.2201574 – ident: cit0032 doi: 10.1176/ajp.120.6.571 – ident: cit0054 – ident: cit0006 – ident: cit0021 – ident: cit0061 doi: 10.3844/ajassp.2008.496.503 – ident: cit0058 – ident: cit0045 doi: 10.1007/978-3-642-22887-2_1 – ident: cit0077 – ident: cit0014 – ident: cit0037 doi: 10.1016/S0004-3702(83)80005-8 – ident: cit0062 – volume-title: Papers in Monetary Economics year: 1975 ident: cit0031 – ident: cit0039 – ident: cit0069 doi: 10.1007/978-3-642-32560-1_7 – ident: cit0070 doi: 10.1504/IJBM.2008.018665 – ident: cit0010 – ident: cit0066 – ident: cit0051 doi: 10.1007/978-3-642-22887-2_29 – volume: 19 start-page: 194 year: 2012 ident: cit0063 publication-title: Journal of Consciousness Studies – ident: cit0003 – ident: cit0049 doi: 10.1007/978-3-642-22887-2_2 – ident: cit0030 – ident: cit0072 – ident: cit0028 – ident: cit0064 doi: 10.1007/978-3-642-29694-9_1 – ident: cit0007 – ident: cit0055 – ident: cit0076 doi: 10.1007/978-3-642-22887-2_48 – ident: cit0059 – volume: 43 start-page: 561 year: 1990 ident: cit0023 publication-title: Journal of the British Interplanetary Society – ident: cit0048 doi: 10.1090/S0002-9947-1953-0053041-6 – ident: cit0013 – ident: cit0042 – ident: cit0026 doi: 10.1257/jel.40.2.351 – ident: cit0046 – ident: cit0078 doi: 10.1007/978-3-663-02723-2 – ident: cit0075 – ident: cit0027 – ident: cit0002 doi: 10.1109/ICAT.2011.6102123 – ident: cit0073 doi: 10.1109/ICMLA.2012.16 – ident: cit0036 doi: 10.1037/a0021526 – ident: cit0004 – ident: cit0022 doi: 10.1007/BF00486638 – ident: cit0065 doi: 10.1109/ccnc08.2007.198 – ident: cit0056 – ident: cit0024 doi: 10.1007/978-3-642-22887-2_35 – ident: cit0016 doi: 10.1111/1467-9213.00309 – ident: cit0043 doi: 10.1037/h0058775 – ident: cit0020 doi: 10.1038/scientificamerican0660-53 – volume-title: The New York Times year: 1995 ident: cit0035 – ident: cit0071 doi: 10.1117/12.773554 – ident: cit0008 – ident: cit0012 doi: 10.1007/s11023-009-9173-3 – ident: cit0033 – ident: cit0050 doi: 10.1007/BF01491891 – ident: cit0053 doi: 10.1112/plms/s2-42.1.230 – ident: cit0068 – volume: 12 start-page: 37 year: 1991 ident: cit0041 publication-title: AI Magazine – ident: cit0052 doi: 10.1038/467878a – ident: cit0040 doi: 10.1007/978-3-642-32560-1_6 – ident: cit0001 – ident: cit0074 – ident: cit0047 – ident: cit0019 – year: 2003 ident: cit0029 publication-title: Dynamical Psychology – ident: cit0057 |
| SSID | ssj0001511 |
| Score | 2.1891308 |
| Snippet | The notion of 'wireheading', or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the... |
| SourceID | proquest crossref informaworld |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 373 |
| SubjectTerms | Artificial intelligence Brain counterfeit utility Expert systems Integrity Intelligent agents Knowledge management literalness Neurosciences reward function Stimulation Utilities Utility functions wireheading |
| Title | Utility function security in artificially intelligent agents |
| URI | https://www.tandfonline.com/doi/abs/10.1080/0952813X.2014.895114 https://www.proquest.com/docview/1543776989 https://www.proquest.com/docview/1559649380 |
| Volume | 26 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVEBS databaseName: Academic Search Ultimate - eBooks customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn eissn: 1362-3079 dateEnd: 20241028 omitProxy: true ssIdentifier: ssj0001511 issn: 0952-813X databaseCode: ABDBF dateStart: 19980701 isFulltext: true titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn providerName: EBSCOhost – providerCode: PRVEBS databaseName: Business Source Ultimate customDbUrl: eissn: 1362-3079 dateEnd: 20241028 omitProxy: false ssIdentifier: ssj0001511 issn: 0952-813X databaseCode: AKVCP dateStart: 19980701 isFulltext: true titleUrlDefault: https://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=ehost&defaultdb=bsu providerName: EBSCOhost – providerCode: PRVEBS databaseName: Inspec with Full Text customDbUrl: eissn: 1362-3079 dateEnd: 20241028 omitProxy: false ssIdentifier: ssj0001511 issn: 0952-813X databaseCode: ADMLS dateStart: 19890101 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text providerName: EBSCOhost – providerCode: PRVLSH databaseName: aylor and Francis Online customDbUrl: mediaType: online eissn: 1362-3079 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001511 issn: 0952-813X databaseCode: AHDZW dateStart: 19960101 isFulltext: true providerName: Library Specific Holdings – providerCode: PRVAWR databaseName: Taylor & Francis Science and Technology Library-DRAA customDbUrl: eissn: 1362-3079 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001511 issn: 0952-813X databaseCode: 30N dateStart: 19970101 isFulltext: true titleUrlDefault: http://www.tandfonline.com/page/title-lists providerName: Taylor & Francis |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Bb9MwFLZGd-ECYwNRGMiTuKFUcZ_jJNIu0wBNiO1CO22nKI4dVFHSiWYH-PV7L3bSlFYDdomq2Imr9748f3H8vsfYOxVZsEqaACg9VwphglRrHZAyobGRCaOIspHPL9TZVH6-iq5W1Uab7JJaj4rfW_NKHuJVPId-pSzZ__Bsd1M8gb_Rv3hED-Pxn3w8rWcNi6bJqfHj0lejo1UM6u_kIea_GlUIp7xZv8-_tfpNW1jpmuI_waKf6bi6Ze-GK2hc5z9uFvPl95lz3II-D1yO-usKQjZ7UKFDwmSjxMfa2iGGUdEU8sVJxEVOSr_CgJH2Q6tLhvcQgl6cBFe_xE-54KoIbURzv_0RR6PBaB-eHCVICV3a6R862a7hEdsdY3wPB2z35MP5l6_drIzMRjjdRffX2zRK0lnfMsAaTVkTsd2YtBsmMtljT7yz-InDwzO2Y6t99rQtz8F9tD5gxx4evIUHb-HBZxXvw4P34MEdPJ6z6aePk9OzwBfLCAoJSR3AOC5MjPwL-XRqQm2RiGqSG5QqKkVciiJPLMgSCZ4qkkSEpjSA5NnaWNkxNr1gg2pR2ZeMUy0WyMtS6AQk6DBXUCqtjMhjNE6uhwxa22SFV5KngibzTLSCs96iGVk0cxYdsqC76sYpqfylf9I3e1Y3gCwdFjO4_9LD1kWZf1yXGb4rQBxTvdQhO-qaMZjSF7K8sotb6hOlSqaQhK8ePvpr9nj1PB2yQf3z1r5B5lrrtx6Ud6MllF0 |
| linkProvider | EBSCOhost |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3NS8MwFA8yD3pxfuJ0agWvnc1emmbgRUSZuu3kwFtomgTE0YnrDvrXm9c2ww9U0EuhpGmbj_fer-nL70fICY8NGM50CLg9l1Gqw55SKkRmQm1iHcUx7kYejnh_zG7uY59NOKvTKvEb2lZEEaWvRuPGxWifEnfqYEFXULjHzCzWEQ4koJT1cuywPooYQDRaOGMX0GhFt-cM31Xxu-e-ucuH6PSBu_SLry4D0FWTKP_qVd7JY2deqE72-onV8V9tWydrNTwNzqv5tEGWTL5Jml76Iag9wRY5GxeYVPsSYFzEsQ1mtRBe8JAHOB0rZooJnnvSzyJI8TjbJuOry7uLflgLMYQZA1GE0E0ynbjY7rBaT0fKOJCjkMqO8djSxNIsFQaYdeCBZ0LQSFsNDpgZk3DTdUU7pJFPc7NLAtT5gNRaqgQwUFHKwXLFNU0T19ZUtQj4AZBZzVKOYhkTST2Zad1BEjtIVh3UIuGi1lPF0vHL9eL92MqiXB2xlZSJhJ-rtv08kLW5z6TDoZAkqMXZIseLYmeo-Pclzc10jtfEPc56IKK9vz_9iKz074YDObge3e6TVSwpc4ehTRrF89wcOIRUqMPSBt4A5d8B8Q |
| linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3dS8MwEA-iIL44P3E6tYKvnc2Sphn4IuqYX8MHB3sLTZOAOLrhugf96801zdgUFfSlUJK0zTV392tz9zuETlmsiWZUhQTScynGKmxLKUNgJlQ6VlEcQzbyQ491-_R2EA_msvghrBK-oY0jiihtNSj3WBkfEXdmUUGLYzKAwCza5BYjQCXrFQabYpDEEfVmttj6M-zY9qze2yE-ee6bqyw4pwXq0i-muvQ_nRpK_ZO7sJOX5rSQzez9E6njf6a2gdYrcBpcuNW0iZZ0voVqvvBDUNmBbXTeLyCk9i0ArwhvNphUZfCC5zyAxeh4KYZw7ik_iyCF42QH9TvXT5fdsCrDEGaU8CIkrSRTifXsFqm1VSS1hTgSiOwoiw1ODM5Srgk1FjqwjHMcKaOIhWVaJ0y3bNMuWs5Hud5DAVT5IKkxWHJCiYxSRgyTTOE0sXNNZR0RL3-RVRzlUCpjKLCnMq0EJEBAwgmojsLZqLHj6PilP59_taIo_40YV8hEkJ-HNvwyEJWyT4RFoSRJoBJnHZ3Mmq2awt5LmuvRFPrEbUbbhEf7f7_7MVp9vOqI-5ve3QFag4YycJg00HLxOtWHFh4V8qjUgA-suQCV |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Utility+function+security+in+artificially+intelligent+agents&rft.jtitle=Journal+of+experimental+%26+theoretical+artificial+intelligence&rft.au=Yampolskiy%2C+Roman+V.&rft.date=2014-07-03&rft.pub=Taylor+%26+Francis&rft.issn=0952-813X&rft.eissn=1362-3079&rft.volume=26&rft.issue=3&rft.spage=373&rft.epage=389&rft_id=info:doi/10.1080%2F0952813X.2014.895114&rft.externalDocID=895114 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0952-813X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0952-813X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0952-813X&client=summon |