8–10% of algorithmic recommendations are ‘bad’, but… an exploratory risk-utility meta-analysis and its regulatory implications

We conducted a quantitatively coarse-grained, but wide-ranging evaluation of the frequency recommender algorithms provide ‘good’ and ‘bad’ recommendations, with a focus on the latter. We found 151 algorithmic audits from 33 studies that report fitting risk-utility statistics from YouTube, Google Sea...

Full description

Saved in:

Bibliographic Details
Published in	International journal of information management Vol. 75; p. 102743
Main Authors	Hilbert, Martin, Thakur, Arti, Flores, Pablo M., Zhang, Xiaoya, Bhan, Jee Young, Bernhard, Patrick, Ji, Feng
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.04.2024
Subjects	Algorithmic auditing Digital harms Machine behavior Meta-analysis Recommender algorithms Machine behavior Digital harms Algorithmic auditing Recommender algorithms Meta-analysis
Online Access	Get full text
ISSN	0268-4012 1873-4707 1873-4707
DOI	10.1016/j.ijinfomgt.2023.102743

Cover

More Information
Summary:	We conducted a quantitatively coarse-grained, but wide-ranging evaluation of the frequency recommender algorithms provide ‘good’ and ‘bad’ recommendations, with a focus on the latter. We found 151 algorithmic audits from 33 studies that report fitting risk-utility statistics from YouTube, Google Search, Twitter, Facebook, TikTok, Amazon, and others. Our findings indicate that roughly 8–10% of algorithmic recommendations are ‘bad’, while about a quarter actively protect users from self-induced harm (‘do good’). This average is remarkably consistent across the audits, irrespective of the platform nor on the kind of risk (bias/ discrimination, mental health and child harm, misinformation, or political extremism). Algorithmic audits find negative feedback loops that can ensnare users into spirals of ‘bad’ recommendations (or being ‘dragged down the rabbit hole’), but also highlight an even larger likelihood of positive spirals of ‘good recommendations’. While our analysis refrains from any judgment of the causal consequences and severity of risks, the detected levels surpass those associated with many other consumer products. They are comparable to the risk levels of generic food defects monitored by public authorities such as the FDA or FSIS in the United States. Consequently, our findings inform the ongoing discussion regarding regulatory oversight of the potential risks posed by recommender algorithms. •Analyzed 151 algorithmic audits for frequency of 'good' and 'bad' recommendations.•8%− 10% of algorithmic recommendations found to be 'bad'; a quarter ‘do good’.•Spiral of ‘bad’ recommendations exists, but upward spiral of ‘good’ is even larger.•Risk is consistent across platforms and harms (bias, mental health, fake, etc.).•Risk levels akin to generic food defects, suggesting need for regulatory oversight.
ISSN:	0268-4012 1873-4707 1873-4707
DOI:	10.1016/j.ijinfomgt.2023.102743