A 'Human-in-the-Loop' approach for Information Extraction from Privacy Policies under Data Scarcity

Machine-readable representations of privacy policies are door openers for a broad variety of novel privacy-enhancing and, in particular, transparency-enhancing technologies (TETs). In order to generate such representations, transparency information needs to be extracted from written privacy policies...

Full description

Saved in:
Bibliographic Details
Published inIEEE European Symposium on Security and Privacy Workshops (Online) pp. 76 - 83
Main Authors Gebauer, Michael, Maschhur, Faraz, Leschke, Nicola, Grunewald, Elias, Pallas, Frank
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2023
Subjects
Online AccessGet full text
ISSN2768-0657
DOI10.1109/EuroSPW59978.2023.00014

Cover

Abstract Machine-readable representations of privacy policies are door openers for a broad variety of novel privacy-enhancing and, in particular, transparency-enhancing technologies (TETs). In order to generate such representations, transparency information needs to be extracted from written privacy policies. However, respective manual annotation and extraction processes are laborious and require expert knowledge. Approaches for fully automated annotation, in turn, have so far not succeeded due to overly high error rates in the specific domain of privacy policies. In the end, a lack of properly annotated privacy policies and respective machine-readable representations persists and enduringly hinders the development and establishment of novel technical approaches fostering policy perception and data subject informedness.In this work, we present a prototype system for a ' Human-in-the-Loop' approach to privacy policy annotation that integrates ML-generated suggestions and ultimately human annotation decisions. We propose an ML-based suggestion system specifically tailored to the constraint of data scarcity prevalent in the domain of privacy policy annotation. On this basis, we provide meaningful predictions to users thereby streamlining the annotation process. Additionally, we also evaluate our approach through a prototypical implementation to show that our ML-based extraction approach provides superior performance over other recently used extraction models for legal documents.
AbstractList Machine-readable representations of privacy policies are door openers for a broad variety of novel privacy-enhancing and, in particular, transparency-enhancing technologies (TETs). In order to generate such representations, transparency information needs to be extracted from written privacy policies. However, respective manual annotation and extraction processes are laborious and require expert knowledge. Approaches for fully automated annotation, in turn, have so far not succeeded due to overly high error rates in the specific domain of privacy policies. In the end, a lack of properly annotated privacy policies and respective machine-readable representations persists and enduringly hinders the development and establishment of novel technical approaches fostering policy perception and data subject informedness.In this work, we present a prototype system for a ' Human-in-the-Loop' approach to privacy policy annotation that integrates ML-generated suggestions and ultimately human annotation decisions. We propose an ML-based suggestion system specifically tailored to the constraint of data scarcity prevalent in the domain of privacy policy annotation. On this basis, we provide meaningful predictions to users thereby streamlining the annotation process. Additionally, we also evaluate our approach through a prototypical implementation to show that our ML-based extraction approach provides superior performance over other recently used extraction models for legal documents.
Author Leschke, Nicola
Grunewald, Elias
Pallas, Frank
Maschhur, Faraz
Gebauer, Michael
Author_xml – sequence: 1
  givenname: Michael
  surname: Gebauer
  fullname: Gebauer, Michael
  email: mg@ise.tu-berlin.de
  organization: Information Systems Engineering - TU Berlin,Berlin
– sequence: 2
  givenname: Faraz
  surname: Maschhur
  fullname: Maschhur, Faraz
  email: f.maschhur@ise.tu-berlin.de
  organization: Information Systems Engineering - TU Berlin,Berlin
– sequence: 3
  givenname: Nicola
  surname: Leschke
  fullname: Leschke, Nicola
  email: nl@ise.tu-berlin.de
  organization: Information Systems Engineering - TU Berlin,Berlin
– sequence: 4
  givenname: Elias
  surname: Grunewald
  fullname: Grunewald, Elias
  email: eg@ise.tu-berlin.de
  organization: Information Systems Engineering - TU Berlin,Berlin
– sequence: 5
  givenname: Frank
  surname: Pallas
  fullname: Pallas, Frank
  email: fp@ise.tu-berlin.de
  organization: Information Systems Engineering - TU Berlin,Berlin
BookMark eNotjlFLwzAUhaMoOOf-gWDe9pR5kzRN8zjmdIOBgyk-jps0ZZG1KWkn7t9b1JfvnIePw7klV01sPCEPHGacg3lcnlLcbT-UMbqYCRByBgA8uyATo00hFUihBahLMhI6LxjkSt-QSdd9DpoUkAEUI-LmdLo61diw0LD-4NkmxnZKsW1TRHegVUx03QyssQ-xocvvPqH7rVWKNd2m8IXuTLfxGFzwHT01pU_0CXukO4fJhf58R64rPHZ-8p9j8v68fFus2Ob1Zb2Yb1gY3vSsslraPLNWlUWVcfTCOq-0EE5IXnpUFge4SntVelN6qCCzOtPGGWc1ZnJM7v92g_d-36ZQYzrvOXADec7lD_HqW8U
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/EuroSPW59978.2023.00014
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Law
EISBN 9798350327205
EISSN 2768-0657
EndPage 83
ExternalDocumentID 10190661
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i204t-fb73b64bb5d8f41ae2bce5722c231dea5baea5cf7e5de9de0f04b7479c9cb7a43
IEDL.DBID RIE
IngestDate Wed Aug 27 02:21:19 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i204t-fb73b64bb5d8f41ae2bce5722c231dea5baea5cf7e5de9de0f04b7479c9cb7a43
PageCount 8
ParticipantIDs ieee_primary_10190661
PublicationCentury 2000
PublicationDate 2023-July
PublicationDateYYYYMMDD 2023-07-01
PublicationDate_xml – month: 07
  year: 2023
  text: 2023-July
PublicationDecade 2020
PublicationTitle IEEE European Symposium on Security and Privacy Workshops (Online)
PublicationTitleAbbrev EUROSPW
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003204008
Score 1.8762314
Snippet Machine-readable representations of privacy policies are door openers for a broad variety of novel privacy-enhancing and, in particular, transparency-enhancing...
SourceID ieee
SourceType Publisher
StartPage 76
SubjectTerms Annotations
Data privacy
Error analysis
Law
Manuals
Privacy
Prototypes
Title A 'Human-in-the-Loop' approach for Information Extraction from Privacy Policies under Data Scarcity
URI https://ieeexplore.ieee.org/document/10190661
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1La8MwDDZrTzvt1bE3Pgx6cpaHHSfHsbWU0ZXCVtZbsR0ZyiAtJd3r10920o0NBruEkEMwkhPpk_V9IuRSCAi5koIVsdWMox_wP1gAyyCJEiu1tdZxh-9H6WDC76Zi2pDVPRcGAHzzGQTu1p_lFwuzdqUy_MIxfKUO7LRkltZkra-CShK7_Zg1PVxRmF-5evbD-EnkiJQCNyY88IjgxxwVH0b6O2S0WUDdPfIcrCsdmI9f2oz_XuEu6Xwz9uj4KxbtkS0o90lrqF4PiLmmXV-qZ_OSYbrHhovFsks3YuIUs1bakJKck2jvrVrVbAfquCf42vmLMu_UKwgjrqaOdrait6pS1B3fGMzjO2TS7z3eDFgzWoHN0UwVs1omOuVaiyKzPFIQawNCxrHBfK8AJbTCi7ESRAF5AaENuUbkkZvcaKl4ckja5aKEI0LzxIDhMtMSoRVmBMrkmUNVNnRSZCI6Jh1np9myVs-YbUx08sfzU7LtfFW3xJ6RdrVawzkG_kpfeId_Avtorfo
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF60HvTkq-LbPQg9bcxjt0mOoi1V01Kwxd7K7mYWipCUkvr69c4mqaIgeAkhh7DMbDLfNzvfDCGXQoDLZShY6hvFOPoB_4MpsAgCLzChMsZY7XB_0O6N-f1ETGqxeqmFAYCy-Awce1ue5ae5XtpUGX7hGL7aluxsCM65qORaXymVwLc7MqqruDw3vrIZ7cfhk4iRKzl2ULhTcoIfk1TKQNLdJoPVEqr6kWdnWShHf_zqzvjvNe6Q5rdmjw6_otEuWYNsj6wn8nWf6GvaKpP1bJYxBHwsyfN5i67aiVPErbSWJVk30c5bsaj0DtSqT_C1sxep32nZQxiZNbXCswW9lYWk9gBHI5JvknG3M7rpsXq4ApuhmQpmVBioNldKpJHhngRfaRCh72tEfClIoSRetAlBpBCn4BqXK-QesY61CiUPDkgjyzM4JDQONGgeRipEcoWYQOo4srzKuLYZmfCOSNPaaTqv-mdMVyY6_uP5BdnsjfrJNLkbPJyQLeu3qkD2lDSKxRLOEAYU6rx0_idGhrFH
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE+European+Symposium+on+Security+and+Privacy+Workshops+%28Online%29&rft.atitle=A+%27Human-in-the-Loop%27+approach+for+Information+Extraction+from+Privacy+Policies+under+Data+Scarcity&rft.au=Gebauer%2C+Michael&rft.au=Maschhur%2C+Faraz&rft.au=Leschke%2C+Nicola&rft.au=Grunewald%2C+Elias&rft.date=2023-07-01&rft.pub=IEEE&rft.eissn=2768-0657&rft.spage=76&rft.epage=83&rft_id=info:doi/10.1109%2FEuroSPW59978.2023.00014&rft.externalDocID=10190661