AI 기반 개인정보 침해평가 시스템 : 정부입법 및 의원입법안에 대한 NLP 활용

AI 기반 개인정보 침해평가 시스템은 개체명인식, 인텐트분류, 문서분류, 문서요약 등 다양한 자연어처리 기법을 사용하여 정부입법 제·개정령안과 의원입법안에 대하여 평가인력의 침해평가 업무를 지원하도록 개발되었다. 기존 한 단락으로 작성되었던 침해평가안은 신규 개인정보 침해평가 양식에서 제·개정내용, 검토내용, 검토의견으로 세분화하여 더 구체적으로 변경되었다. 신규 양식의 제·개정내용은 4종의 개인정보(일반개인정보, 고유식별정보, 민감정보, 영상정보), 정보주체, 개인정보처리자, 개인정보처리 목적으로 나누어지며 모두 학습된 개체명 인...

Full description

Saved in:

Bibliographic Details
Published in	정보과학회 컴퓨팅의 실제 논문지 Vol. 29; no. 12; pp. 545 - 554
Main Authors	채정민(JeongMin Chae), 김효정(HyoJung Kim)
Format	Journal Article
Language	Korean
Published	Korean Institute of Information Scientists and Engineers 01.12.2023 한국정보과학회
Subjects	컴퓨터학 인텐트분류 personal information breach assessment 개체명인식 문서요약 document summarization named entity recognition 개인정보침해평가 document classification intent classification 문서분류
Online Access	Get full text
ISSN	2383-6318 2383-6326
DOI	10.5626/KTCP.2023.29.12.545

Cover

More Information
Summary:	AI 기반 개인정보 침해평가 시스템은 개체명인식, 인텐트분류, 문서분류, 문서요약 등 다양한 자연어처리 기법을 사용하여 정부입법 제·개정령안과 의원입법안에 대하여 평가인력의 침해평가 업무를 지원하도록 개발되었다. 기존 한 단락으로 작성되었던 침해평가안은 신규 개인정보 침해평가 양식에서 제·개정내용, 검토내용, 검토의견으로 세분화하여 더 구체적으로 변경되었다. 신규 양식의 제·개정내용은 4종의 개인정보(일반개인정보, 고유식별정보, 민감정보, 영상정보), 정보주체, 개인정보처리자, 개인정보처리 목적으로 나누어지며 모두 학습된 개체명 인식기를 통해서 추출되었다. 개체명 인식기는 xlm-roberta-large 사전언어 모델로 부터 Fine-tuning되었으며 8종 개체명에 대하여 micro F1-score는 0.707였으며 4종의 개인정보의 micro F1-score는 0.737였다. 신규 양식의 검토내용(법적합성, 개인정보처리 필요성, 업무처리 불가피성, 정보관리 안정성)은 챗봇에서 널리 사용되고 있는 인텐트분류 기법을 활용하고 슬롯 태깅기법으로 140여개의 검토의견을 생성하여 제공한다. 신규 양식의 검토의견은 개인정보별 침해여부를 판단하는 것으로 시작된다. 침해여부 분류기는 CNN + Fasttext 분류 모델로 학습되었으며 모델의 micro F1-score는 0.91였다. 이러한 AI 시스템의 도입을 통해 새로 세분화되어 양식이 변경되었음에도 불구하고 업무 효율성을 크게 향상시킬 수 있었으며 신속하게 침해평가를 진행할 수 있도록 평가업무를 지원할 수 있었다. The AI-based Personal Information Breach Assessment System was developed to support evaluators in breach assessment work using NLP techniques, such as named entity recognition, intent classification, document classification, and document summarization, for government legislative revision drafts and legislator bills. The existing breach assessment, which contains one paragraph, was more specifically divided into a new format that included revision content, review content, and review comment. The revised content in the new format is categorized by the purpose of personal information processing, the subject of the information, the personal information processor, and four types of personal information (general personal information, unique identification information, sensitive information, and video information). These were extracted from revision proposals using a named entity recognizer. This named entity recognizer was fine-tuned from the xlm-roberta-large pre-trained language model to identify a total of eight named entities. The micro F1-score for the eight named entities was 0.707, and among them, the micro F1-score for the four types of personal information was 0.737. The review contents in the new format, which includes legal compliance, the necessity of personal information processing, the inevitability of business processing, and the safety of information management, employs the intent classification technique, which is widely used in chatbots, and generates over 140 review opinions using slot tagging. The review opinions in the new format start by determining the breach status for each personal information. The breach determination classifier was trained on a CNN + Fasttext classification model, achieving a micro F1-score of 0.91. Despite the newly subdivided and revised format, the introduction of this AI system showed a significant enhancement in task efficiency, supporting the rapid execution of breach evaluations. KCI Citation Count: 0
ISSN:	2383-6318 2383-6326
DOI:	10.5626/KTCP.2023.29.12.545