SYSTEMS AND METHODS FOR IDENTIFYING TARGETED DATA

The present disclosure provides methods, systems, computing devices, computing entities, and/or the like for identifying and/or retrieving targeted data found in unstructured documents. In accordance with various aspects, a method is provided that comprises: receiving, a targeted data request identi...

Full description

Saved in:
Bibliographic Details
Format Patent
LanguageEnglish
Published 29.07.2022
Online AccessGet full text

Cover

More Information
Summary:The present disclosure provides methods, systems, computing devices, computing entities, and/or the like for identifying and/or retrieving targeted data found in unstructured documents. In accordance with various aspects, a method is provided that comprises: receiving, a targeted data request identifying a data subject; processing a first feature representation of each document of a plurality of documents using a classifier machine-learning model to generate a prediction that the document contains the targeted data; generating a dataset that comprises each document having a prediction that satisfy a threshold; processing a second feature representation of each document of the dataset using a clustering machine-learning model to identify a document cluster for the document; and providing the document clusters so that an analysis can be performed on each document cluster to eliminate the document cluster as having targeted data and/or identify the targeted data associated with the data subject found in the document cluster.