대용량 악성코드의 특징 추출 가속화를 위한 분산 처리 시스템 설계 및 구현

기존 악성코드 탐지는 다형성 또는 난독화 기법이 적용된 변종 악성코드 탐지에 취약하다. 기계학습 알고리즘은 악성코드에 내재된 패턴을 학습시켜 유사 행위 탐지가 가능해 기존 탐지 방법을 대체할 수 있다. 시간에 따라 변화하는 악성코드 패턴을 학습시키기 위해 지속적으로 데이터를 수집해야한다. 그러나 대용량 악성코드 파일의 저장 및 처리 과정은 높은 공간과 시간 복잡도가 수반된다. 이 논문에서는 공간 복잡도를 완화하고 처리 시간을 가속화하기 위해 HDFS 기반 분산 처리 시스템을 설계한다. 분산 처리 시스템을 이용해 2-gram 특징과...

Full description

Saved in:

Bibliographic Details
Published in	정보처리학회논문지. KIPS transactions on computer and communication systems 컴퓨터 및 통신 시스템 Vol. 8; no. 2; pp. 35 - 40
Main Authors	이현종, Hyunjong Lee, 어성율, Seongyul Euh, 황두성, Doosung Hwang
Format	Journal Article
Language	Korean
Published	한국정보처리학회 28.02.2019
Subjects	Distributed Processing System Feature Extraction Machine Learning Malware Detection 기계 학습 분산 처리 시스템 악성코드 탐지 특징 추출
Online Access	Get full text
ISSN	2287-5891

Cover

Abstract	기존 악성코드 탐지는 다형성 또는 난독화 기법이 적용된 변종 악성코드 탐지에 취약하다. 기계학습 알고리즘은 악성코드에 내재된 패턴을 학습시켜 유사 행위 탐지가 가능해 기존 탐지 방법을 대체할 수 있다. 시간에 따라 변화하는 악성코드 패턴을 학습시키기 위해 지속적으로 데이터를 수집해야한다. 그러나 대용량 악성코드 파일의 저장 및 처리 과정은 높은 공간과 시간 복잡도가 수반된다. 이 논문에서는 공간 복잡도를 완화하고 처리 시간을 가속화하기 위해 HDFS 기반 분산 처리 시스템을 설계한다. 분산 처리 시스템을 이용해 2-gram 특징과 필터링 기준에 따른 API 특징 2개, APICFG 특징을 추출하고 앙상블 학습 모델의 일반화 성능을 비교했다. 실험 결과로 특징 추출의 시간 복잡도는 컴퓨터 한 대의 처리 시간과 비교했을 때 약 3.75배 속도가 개선되었으며, 공간 복잡도는 약 5배의 효율성을 보였다. 특징 별 분류 성능을 비교했을 때 2-gram 특징이 가장 우수했으나 훈련 데이터 차원이 높아 학습 시간이 오래 소요되었다. Traditional Malware Detection is susceptible for detecting malware which is modified by polymorphism or obfuscation technology. By learning patterns that are embedded in malware code, machine learning algorithms can detect similar behaviors and replace the current detection methods. Data must collected continuously in order to learn malicious code patterns that change over time. However, the process of storing and processing a large amount of malware files is accompanied by high space and time complexity. In this paper, an HDFS-based distributed processing system is designed to reduce space complexity and accelerate feature extraction time. Using a distributed processing system, we extract two API features based on filtering basis, 2-gram feature and APICFG feature and the generalization performance of ensemble learning models is compared. In experiments, the time complexity of the feature extraction was improved about 3.75 times faster than the processing time of a single computer, and the space complexity was about 5 times more efficient. The 2-gram feature was the best when comparing the classification performance by feature, but the learning time was long due to high dimensionality.
AbstractList	Traditional Malware Detection is susceptible for detecting malware which is modified by polymorphism or obfuscation technology. By learning patterns that are embedded in malware code, machine learning algorithms can detect similar behaviors and replace the current detection methods. Data must collected continuously in order to learn malicious code patterns that change over time. However, the process of storing and processing a large amount of malware files is accompanied by high space and time complexity. In this paper, an HDFS-based distributed processing system is designed to reduce space complexity and accelerate feature extraction time. Using a distributed processing system, we extract two API features based on filtering basis, 2-gram feature and APICFG feature and the generalization performance of ensemble learning models is compared. In experiments, the time complexity of the feature extraction was improved about 3.75 times faster than the processing time of a single computer, and the space complexity was about 5 times more efficient. The 2-gram feature was the best when comparing the classification performance by feature, but the learning time was long due to high dimensionality. 기존 악성코드 탐지는 다형성 또는 난독화 기법이 적용된 변종 악성코드 탐지에 취약하다. 기계학습 알고리즘은 악성코드에 내재된 패턴을 학습시켜 유사 행위 탐지가 가능해 기존 탐지 방법을 대체할 수 있다. 시간에 따라 변화하는 악성코드 패턴을 학습시키기 위해 지속적으로 데이터를 수집해야한다. 그러나 대용량 악성코드 파일의 저장 및 처리 과정은 높은 공간과 시간 복잡도가 수반된다. 이 논문에서는 공간 복잡도를 완화하고 처리 시간을 가속화하기 위해 HDFS 기반 분산 처리 시스템을 설계한다. 분산 처리 시스템을 이용해 2-gram 특징과 필터링 기준에 따른 API 특징 2개, APICFG 특징을 추출하고 앙상블 학습 모델의 일반화 성능을 비교했다. 실험 결과로 특징 추출의 시간 복잡도는 컴퓨터 한 대의 처리 시간과 비교했을 때 약 3.75배 속도가 개선되었으며, 공간 복잡도는 약 5배의 효율성을 보였다. 특징 별 분류 성능을 비교했을 때 2-gram 특징이 가장 우수했으나 훈련 데이터 차원이 높아 학습 시간이 오래 소요되었다. 기존 악성코드 탐지는 다형성 또는 난독화 기법이 적용된 변종 악성코드 탐지에 취약하다. 기계학습 알고리즘은 악성코드에 내재된 패턴을 학습시켜 유사 행위 탐지가 가능해 기존 탐지 방법을 대체할 수 있다. 시간에 따라 변화하는 악성코드 패턴을 학습시키기 위해 지속적으로 데이터를 수집해야한다. 그러나 대용량 악성코드 파일의 저장 및 처리 과정은 높은 공간과 시간 복잡도가 수반된다. 이 논문에서는 공간 복잡도를 완화하고 처리 시간을 가속화하기 위해 HDFS 기반 분산 처리 시스템을 설계한다. 분산 처리 시스템을 이용해 2-gram 특징과 필터링 기준에 따른 API 특징 2개, APICFG 특징을 추출하고 앙상블 학습 모델의 일반화 성능을 비교했다. 실험 결과로 특징 추출의 시간 복잡도는 컴퓨터 한 대의 처리 시간과 비교했을 때 약 3.75배 속도가 개선되었으며, 공간 복잡도는 약 5배의 효율성을 보였다. 특징 별 분류 성능을 비교했을 때 2-gram 특징이 가장 우수했으나 훈련 데이터 차원이 높아 학습 시간이 오래 소요되었다. Traditional Malware Detection is susceptible for detecting malware which is modified by polymorphism or obfuscation technology. By learning patterns that are embedded in malware code, machine learning algorithms can detect similar behaviors and replace the current detection methods. Data must collected continuously in order to learn malicious code patterns that change over time. However, the process of storing and processing a large amount of malware files is accompanied by high space and time complexity. In this paper, an HDFS-based distributed processing system is designed to reduce space complexity and accelerate feature extraction time. Using a distributed processing system, we extract two API features based on filtering basis, 2-gram feature and APICFG feature and the generalization performance of ensemble learning models is compared. In experiments, the time complexity of the feature extraction was improved about 3.75 times faster than the processing time of a single computer, and the space complexity was about 5 times more efficient. The 2-gram feature was the best when comparing the classification performance by feature, but the learning time was long due to high dimensionality.
Author	황두성 이현종 Hyunjong Lee Seongyul Euh Doosung Hwang 어성율
Author_xml	– sequence: 1 fullname: 이현종 – sequence: 2 fullname: Hyunjong Lee – sequence: 3 fullname: 어성율 – sequence: 4 fullname: Seongyul Euh – sequence: 5 fullname: 황두성 – sequence: 6 fullname: Doosung Hwang
BookMark	eNo9zD1Lw0AcgPEMFay1n8DlFsfA3SWXuxtL8b3QpXu4JBcIrVUaF7ciqWBVHGygSsSlKkKHqlgQ9As1_3wHBcXpWX48K0ape9DVJaNMqeAmE5IsG9U4jjxMqG07lIiyofOLPtw-5_dnCNIBJC_wNcqvM7gbo2L4AU8pgvkI5hlazPpwelncjPKHTwRZUqQZyucJnMwQvI7zxymC8wyGk2KQIUgmi7cE5bMrtHifFuNk1VgKVSfW1b9WjNbmRqu-bTaaWzv1WsNsMyxNR5PAU8rnth1iSxPOQ6p5aAkSEOwxSRULuNKBr5gjtMCYMqmI5wuslBNy36oY67_bdhQfRW43iDvubm2vSTGRhHKHMCmkJX_c2r-L3cNetK96x67lOJgLbn0D8Hh4lQ
ContentType	Journal Article
DBID	HZB Q5X JDI
DEWEY	004
DatabaseName	Korea Information Science Society (KISS) Korean Studies Information Service System (KISS) B-Type KoreaScience
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
DocumentTitleAlternate	대용량 악성코드의 특징 추출 가속화를 위한 분산 처리 시스템 설계 및 구현
EndPage	40
ExternalDocumentID	JAKO201912761598939 3660787
GroupedDBID	ALMA_UNASSIGNED_HOLDINGS HZB Q5X .UV JDI
ID	FETCH-LOGICAL-k509-6e1dbaac744f03e177f2e7f381d10b592a5d7aedca568e800259a1bc80aa6f7c3
ISSN	2287-5891
IngestDate	Fri Dec 22 12:03:37 EST 2023 Sat Feb 15 02:11:15 EST 2025
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Issue	2
Language	Korean
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-k509-6e1dbaac744f03e177f2e7f381d10b592a5d7aedca568e800259a1bc80aa6f7c3
Notes	Korea Information Processing Society KISTI1.1003/JNL.JAKO201912761598939
OpenAccessLink	http://click.ndsl.kr/servlet/LinkingDetailView?cn=JAKO201912761598939&dbt=JAKO&org_code=O481&site_code=SS1481&service_code=01
PageCount	6
ParticipantIDs	kisti_ndsl_JAKO201912761598939 kiss_primary_3660787
PublicationCentury	2000
PublicationDate	20190228
PublicationDateYYYYMMDD	2019-02-28
PublicationDate_xml	– month: 02 year: 2019 text: 20190228 day: 28
PublicationDecade	2010
PublicationTitle	정보처리학회논문지. KIPS transactions on computer and communication systems 컴퓨터 및 통신 시스템
PublicationTitleAlternate	정보처리학회논문지. 컴퓨터 및 통신시스템(KTCCS)
PublicationYear	2019
Publisher	한국정보처리학회
Publisher_xml	– name: 한국정보처리학회
SSID	ssib012446218 ssib044742767 ssib013223550 ssib014210004 ssib058467651 ssib053377452
Score	1.6941981
Snippet	기존 악성코드 탐지는 다형성 또는 난독화 기법이 적용된 변종 악성코드 탐지에 취약하다. 기계학습 알고리즘은 악성코드에 내재된 패턴을 학습시켜 유사 행위 탐지가... Traditional Malware Detection is susceptible for detecting malware which is modified by polymorphism or obfuscation technology. By learning patterns that are...
SourceID	kisti kiss
SourceType	Open Access Repository Publisher
StartPage	35
SubjectTerms	Distributed Processing System Feature Extraction Machine Learning Malware Detection 기계 학습 분산 처리 시스템 악성코드 탐지 특징 추출
Title	대용량 악성코드의 특징 추출 가속화를 위한 분산 처리 시스템 설계 및 구현
URI	https://kiss.kstudy.com/ExternalLink/Ar?key=3660787 http://click.ndsl.kr/servlet/LinkingDetailView?cn=JAKO201912761598939&dbt=JAKO&org_code=O481&site_code=SS1481&service_code=01
Volume	8
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources issn: 2287-5891 databaseCode: M~E dateStart: 20120101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://road.issn.org omitProxy: true ssIdentifier: ssib044742767 providerName: ISSN International Centre
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LaxRBEB6SnLz4imJ8hDnYJ5kwj57p7uPM7oaY4AOMkNsyrwWNbMQkBz1IkI1gVDyYQJQVL1ERcoiKAUH_0E7vf7D6MbOrBl-Xpamt_rq6amaqaqa7yzDOt1LiZ8xxLXDViYUxJhYLcGyRrJV6KTiUTJZOuHQ5mLmOZxf8hZHR3tCqpdWVZCq9d-C-kv-xKtDArmKX7D9YtgIFArTBvvALFobfv7IxakSI1sRihUYNsRCFTFDYNKKQ6guSj6gvGhSjyBGNqI4YlkweYjXJAxQK3HVEQxQxQQoJ9JQAUSDZVaMGJGCx9Xg0QLQu-jGmMUMfRTU1sBxS_OerfpFAoBKKuoCh0F0YWnYMUKg60kiLBcKEEoFqBIkoSCCCB20FCsJMK7F8gJAjUqSqwZUht5yRLSQRHTwU4V_HLkVlshEKpQrVivlIHuCnWjPUnrowd_HqNVFco6y0Lj-5pLo-RrlTcLDvRh-XvawmHUkJ6sICIZUzxEohg-nUhXIjXysktH-nmvKO0bbU2FQrO3SRqlVUsQRaAdUlAaAhHbBIe-oLyUHMrnhLFr10a-buavvmko579GsjsVOt2oYvvYsLqbIlSkoOu0I6dMe7Q25NnSijAyR1vNaPR5f_FFJUCz1nw7krYnTHJRA6MwiO2agxCj5GLLK93yif_CLIDNxBlCRekUAgXHkSB7viO1TlOeCRggGxCpwhZ4E0RpbXquYFMRakYMuQdIpM7MZQ1Dh_1Dis0z0zVPfuMWNkcem4caQspWJqzzpu5MWTNf7yffH6kcm31nnnA_-2WTzv8lfbZn_jC3-3ZfL9Tb7fNXt7a_zh0_6LzeLNV5N3O_2trlnsd_iDPZN_3C7e7pr8cZdv7PTXuybv7PQ-dcxi75nZ-7zb3-6cMOanG_O1GUtXQLEWIZC3gtzJkjhOCcYt28sdQlpuTloQZGeOnfjMjf2MxHmWxn5Ac5H6-Sx2kpTacRy0SOqdNMbaS-38lGGmmJLUyVLfzmLMWjj2Y3gmM2InXuLGQTJhjAttNW-rM26aXhBA8kAmjEmpvWY7W77VPMCap__EcMY4NLj8zhpjK3dW83MQx68kk_IS-A6OM8gH
linkProvider	ISSN International Centre
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%EB%8C%80%EC%9A%A9%EB%9F%89+%EC%95%85%EC%84%B1%EC%BD%94%EB%93%9C%EC%9D%98+%ED%8A%B9%EC%A7%95+%EC%B6%94%EC%B6%9C+%EA%B0%80%EC%86%8D%ED%99%94%EB%A5%BC+%EC%9C%84%ED%95%9C+%EB%B6%84%EC%82%B0+%EC%B2%98%EB%A6%AC+%EC%8B%9C%EC%8A%A4%ED%85%9C+%EC%84%A4%EA%B3%84+%EB%B0%8F+%EA%B5%AC%ED%98%84&rft.jtitle=%EC%A0%95%EB%B3%B4%EC%B2%98%EB%A6%AC%ED%95%99%ED%9A%8C%EB%85%BC%EB%AC%B8%EC%A7%80.+KIPS+transactions+on+computer+and+communication+systems+%EC%BB%B4%ED%93%A8%ED%84%B0+%EB%B0%8F+%ED%86%B5%EC%8B%A0+%EC%8B%9C%EC%8A%A4%ED%85%9C&rft.au=%EC%9D%B4%ED%98%84%EC%A2%85&rft.au=%EC%96%B4%EC%84%B1%EC%9C%A8&rft.au=%ED%99%A9%EB%91%90%EC%84%B1&rft.au=Lee%2C+Hyunjong&rft.date=2019-02-28&rft.issn=2287-5891&rft.volume=8&rft.issue=2&rft.spage=35&rft.epage=40&rft.externalDBID=n%2Fa&rft.externalDocID=JAKO201912761598939
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2287-5891&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2287-5891&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2287-5891&client=summon