8–10% of algorithmic recommendations are ‘bad’, but… an exploratory risk-utility meta-analysis and its regulatory implications

We conducted a quantitatively coarse-grained, but wide-ranging evaluation of the frequency recommender algorithms provide ‘good’ and ‘bad’ recommendations, with a focus on the latter. We found 151 algorithmic audits from 33 studies that report fitting risk-utility statistics from YouTube, Google Sea...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of information management Vol. 75; p. 102743
Main Authors Hilbert, Martin, Thakur, Arti, Flores, Pablo M., Zhang, Xiaoya, Bhan, Jee Young, Bernhard, Patrick, Ji, Feng
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.04.2024
Subjects
Online AccessGet full text
ISSN0268-4012
1873-4707
1873-4707
DOI10.1016/j.ijinfomgt.2023.102743

Cover

More Information
Summary:We conducted a quantitatively coarse-grained, but wide-ranging evaluation of the frequency recommender algorithms provide ‘good’ and ‘bad’ recommendations, with a focus on the latter. We found 151 algorithmic audits from 33 studies that report fitting risk-utility statistics from YouTube, Google Search, Twitter, Facebook, TikTok, Amazon, and others. Our findings indicate that roughly 8–10% of algorithmic recommendations are ‘bad’, while about a quarter actively protect users from self-induced harm (‘do good’). This average is remarkably consistent across the audits, irrespective of the platform nor on the kind of risk (bias/ discrimination, mental health and child harm, misinformation, or political extremism). Algorithmic audits find negative feedback loops that can ensnare users into spirals of ‘bad’ recommendations (or being ‘dragged down the rabbit hole’), but also highlight an even larger likelihood of positive spirals of ‘good recommendations’. While our analysis refrains from any judgment of the causal consequences and severity of risks, the detected levels surpass those associated with many other consumer products. They are comparable to the risk levels of generic food defects monitored by public authorities such as the FDA or FSIS in the United States. Consequently, our findings inform the ongoing discussion regarding regulatory oversight of the potential risks posed by recommender algorithms. •Analyzed 151 algorithmic audits for frequency of 'good' and 'bad' recommendations.•8%− 10% of algorithmic recommendations found to be 'bad'; a quarter ‘do good’.•Spiral of ‘bad’ recommendations exists, but upward spiral of ‘good’ is even larger.•Risk is consistent across platforms and harms (bias, mental health, fake, etc.).•Risk levels akin to generic food defects, suggesting need for regulatory oversight.
ISSN:0268-4012
1873-4707
1873-4707
DOI:10.1016/j.ijinfomgt.2023.102743