효율적인 문헌 분류를 위한 시계열 기반 데이터 집합 선정 기법

As the Internet technology advances, data on the web is increasing sharply. Many research study about incremental learning for classifying effectively in data increasing. Web document contains the time-series data such as published date. If we reflect time-series data to classification, it will be a...

Full description

Saved in:

Bibliographic Details
Published in	한국콘텐츠학회 논문지, 17(1) Vol. 17; no. 1; pp. 39 - 49
Main Authors	채영훈(Yeonghun chae), 정도헌(Do-Heon Jeong)
Format	Journal Article
Language	Korean
Published	한국콘텐츠학회 2017
Subjects	학제간연구 기계학습 나이브베이즈 time-Series Analysis 분류 Naive Bayes 시계열분석 Classification SVM Machine Learning
Online Access	Get full text
ISSN	1598-4877 2508-6723
DOI	10.5392/JKCA.2017.17.01.039

Cover

Abstract	As the Internet technology advances, data on the web is increasing sharply. Many research study about incremental learning for classifying effectively in data increasing. Web document contains the time-series data such as published date. If we reflect time-series data to classification, it will be an effective classification. In this study, we analyze the time-series variation of the words. We propose an efficient classification through dividing the dataset based on the analysis of time-series information. For experiment, we corrected 1 million online news articles including time-series information. We divide the dataset and classify the dataset using SVM and $Na{\ddot{i}}ve$ Bayes. In each model, we show that classification performance is increasing. Through this study, we showed that reflecting time-series information can improve the classification performance. 인터넷 기술이 발전함에 따라 온라인상의 데이터는 급격하게 증가하고 있고, 증가하는 데이터에 대해 점진적인 기계학습 기법을 통해 효율적으로 학습하기 위한 연구가 진행되고 있다. 온라인상의 문서는 대부분 게시일, 출판일과 같은 시계열적 정보를 포함하고 있고, 이를 분류에 반영한다면 효율적인 분류가 가능할 것이다. 본 연구에서는 웹 문서상에서 나타나는 어휘의 시계열적 변화를 분석하였고, 분석한 시계열 정보를 기반으로 데이터 집합을 분할하여 효율적인 분류 학습 기법을 제안한다. 실험 및 검증을 위해 온라인상의 뉴스 기사 100만 건을 시계열 정보를 포함하여 수집하였다. 수집된 데이터를 바탕으로 데이터 집합을 분할하여 $Na{\ddot{i}}ve$ Bayes 및 SVM 분류기를 사용하여 실험을 진행하였고, 각 모델에서 전체 데이터 집합 학습 대비 최대 2.02% 포인트, 2.32% 포인트의 성능 향상을 확인하였다. 본 연구를 통해 시계열적 어휘의 변화를 분류에 반영하여 분류의 성능을 향상시킬 수 있음을 확인하였다.
AbstractList	인터넷 기술이 발전함에 따라 온라인상의 데이터는 급격하게 증가하고 있고, 증가하는 데이터에 대해 점진적인 기계학습 기법을 통해 효율적으로 학습하기 위한 연구가 진행되고 있다. 온라인상의 문서는 대부분 게시일, 출판일과 같은 시계열적 정보를 포함하고 있고, 이를 분류에 반영한다면 효율적인 분류가 가능할 것이다. 본 연구에서는 웹 문서상에서 나타나는 어휘의 시계열적 변화를 분석하였고, 분석한 시계열 정보를 기반으로 데이터 집합을 분할하여 효율적인 분류 학습 기법을 제안한다. 실험 및 검증을 위해 온라인상의 뉴스 기사 100만 건을 시계열 정보를 포함하여 수집하였다. 수집된 데이터를 바탕으로 데이터 집합을 분할하여 Naïve Bayes 및 SVM 분류기를 사용하여 실험을 진행하였고, 각 모델에서 전체 데이터 집합 학습 대비 최대 2.02% 포인트, 2.32% 포인트의 성능 향상을 확인하였다. 본 연구를 통해 시계열적 어휘의 변화를 분류에 반영하여 분류의 성능을 향상시킬 수 있음을 확인하였다. As the Internet technology advances, data on the web is increasing sharply. Many research study about incremental learning for classifying effectively in data increasing. Web document contains the time-series data such as published date. If we reflect time-series data to classification, it will be an effective classification. In this study, we analyze the time-series variation of the words. We propose an efficient classification through dividing the dataset based on the analysis of time-series information. For experiment, we corrected 1 million online news articles including time-series information. We divide the dataset and classify the dataset using SVM and Naïve Bayes. In each model, we show that classification performance is increasing. Through this study, we showed that reflecting time-series information can improve the classification performance. KCI Citation Count: 0 As the Internet technology advances, data on the web is increasing sharply. Many research study about incremental learning for classifying effectively in data increasing. Web document contains the time-series data such as published date. If we reflect time-series data to classification, it will be an effective classification. In this study, we analyze the time-series variation of the words. We propose an efficient classification through dividing the dataset based on the analysis of time-series information. For experiment, we corrected 1 million online news articles including time-series information. We divide the dataset and classify the dataset using SVM and $Na{\ddot{i}}ve$ Bayes. In each model, we show that classification performance is increasing. Through this study, we showed that reflecting time-series information can improve the classification performance. 인터넷 기술이 발전함에 따라 온라인상의 데이터는 급격하게 증가하고 있고, 증가하는 데이터에 대해 점진적인 기계학습 기법을 통해 효율적으로 학습하기 위한 연구가 진행되고 있다. 온라인상의 문서는 대부분 게시일, 출판일과 같은 시계열적 정보를 포함하고 있고, 이를 분류에 반영한다면 효율적인 분류가 가능할 것이다. 본 연구에서는 웹 문서상에서 나타나는 어휘의 시계열적 변화를 분석하였고, 분석한 시계열 정보를 기반으로 데이터 집합을 분할하여 효율적인 분류 학습 기법을 제안한다. 실험 및 검증을 위해 온라인상의 뉴스 기사 100만 건을 시계열 정보를 포함하여 수집하였다. 수집된 데이터를 바탕으로 데이터 집합을 분할하여 $Na{\ddot{i}}ve$ Bayes 및 SVM 분류기를 사용하여 실험을 진행하였고, 각 모델에서 전체 데이터 집합 학습 대비 최대 2.02% 포인트, 2.32% 포인트의 성능 향상을 확인하였다. 본 연구를 통해 시계열적 어휘의 변화를 분류에 반영하여 분류의 성능을 향상시킬 수 있음을 확인하였다.
Author	채영훈(Yeonghun chae) 정도헌(Do-Heon Jeong)
Author_xml	– sequence: 1 fullname: 채영훈(Yeonghun chae) – sequence: 2 fullname: 정도헌(Do-Heon Jeong)
BackLink	https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART002193871$$DAccess content in National Research Foundation of Korea (NRF)
BookMark	eNpFkD9Lw0AAxQ-pYNV-ApcsDg6p9yd3lxtLrdo_WJDux6VJJLSm0ujgKGRTB4dikRa6FCs4FC3SwU_UXL6DtRWEB-_B-_GGtw0yYSf0ANhDME-JwIeVarGQxxDx_FIQ5SERGyCLKbRNxjHJgCyiwjYtm_MtkIuiwIGQEsgQw1nQSF8mejDRozs9nBvJ-zx9fjCSrzgZ95Pxt6EHcdobGPp-sPiM9fPMWMynybRvJI9TPZyl8dTQr09p783Q8UiPeqv6o7cLNn3Vjrzcn--AxnGpUTw1a_WTcrFQM1vCQqbFGMbYFYo72HIdRRljomlRn3JCOaY-wkQRWzmuDTn3EaW2hzyqMFZN6DiC7ICD9WzY9WWrGciOClZ-0ZGtriycN8oSEQwJtpfs_pptBdF1IEM3astKoVr__Q0hRhnlTNjonwtvusGl5wZKXi2D6t7Ks_pRCXIEoYCI_ACA5IXz
ContentType	Journal Article
DBID	DBRKI TDB JDI ACYCR
DEWEY	005.7
DOI	10.5392/JKCA.2017.17.01.039
DatabaseName	DBPIA - 디비피아 Nurimedia DBPIA Journals KoreaScience Korean Citation Index
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
DocumentTitleAlternate	Time–Series based Dataset Selection Method for Effective Text Classification
DocumentTitle_FL	Time-Series based Dataset Selection Method for Effective Text Classification
EISSN	2508-6723
EndPage	49
ExternalDocumentID	oai_kci_go_kr_ARTI_1320328 JAKO201711656576981 NODE07100901
GroupedDBID	.UV ALMA_UNASSIGNED_HOLDINGS DBRKI TDB JDI ACYCR M~E
ID	FETCH-LOGICAL-k941-466222d9a7b24dba56669c45f5735725f123a38abd8077f1558e1e5a22ac0bb93
ISSN	1598-4877
IngestDate	Tue Nov 21 21:41:00 EST 2023 Fri Dec 22 11:58:38 EST 2023 Thu Feb 06 13:23:51 EST 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Issue	1
Keywords	기계학습 나이브베이즈 time-Series Analysis 분류 Naive Bayes 시계열분석 Classification SVM Machine Learning
Language	Korean
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-k941-466222d9a7b24dba56669c45f5735725f123a38abd8077f1558e1e5a22ac0bb93
Notes	KISTI1.1003/JNL.JAKO201711656576981 G704-001475.2017.17.1.022
OpenAccessLink	http://click.ndsl.kr/servlet/LinkingDetailView?cn=JAKO201711656576981&dbt=JAKO&org_code=O481&site_code=SS1481&service_code=01
PageCount	11
ParticipantIDs	nrf_kci_oai_kci_go_kr_ARTI_1320328 kisti_ndsl_JAKO201711656576981 nurimedia_primary_NODE07100901
PublicationCentury	2000
PublicationDate	2017 2017-01
PublicationDateYYYYMMDD	2017-01-01
PublicationDate_xml	– year: 2017 text: 2017
PublicationDecade	2010
PublicationTitle	한국콘텐츠학회 논문지, 17(1)
PublicationTitleAlternate	The Journal of the Korea Contents Association
PublicationYear	2017
Publisher	한국콘텐츠학회
Publisher_xml	– name: 한국콘텐츠학회
SSID	ssib005306162 ssib036279156 ssib001107260 ssib053377518 ssib030194663 ssib044738273
Score	1.6225317
Snippet	As the Internet technology advances, data on the web is increasing sharply. Many research study about incremental learning for classifying effectively in data... 인터넷 기술이 발전함에 따라 온라인상의 데이터는 급격하게 증가하고 있고, 증가하는 데이터에 대해 점진적인 기계학습 기법을 통해 효율적으로 학습하기 위한 연구가...
SourceID	nrf kisti nurimedia
SourceType	Open Website Open Access Repository Publisher
StartPage	39
SubjectTerms	학제간연구
Title	효율적인 문헌 분류를 위한 시계열 기반 데이터 집합 선정 기법
URI	https://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE07100901 http://click.ndsl.kr/servlet/LinkingDetailView?cn=JAKO201711656576981&dbt=JAKO&org_code=O481&site_code=SS1481&service_code=01 https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART002193871
Volume	17
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
ispartofPNX	한국콘텐츠학회 논문지, 2017, 17(1), , pp.39-49
journalDatabaseRights	– providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2508-6723 dateEnd: 99991231 omitProxy: true ssIdentifier: ssib044738273 issn: 1598-4877 databaseCode: M~E dateStart: 20060101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Na9RAFA-1HvTit1g_ShDnVFLzuZk55qvUFtvLFuop5GNTy8qu1O7FgyDsTT14WFpkC70UK3hYtEgP_kXd7P_gmzdJGmrBD1iyw3szv8y8ycy8N5m8J0mPWKbCY2vpSszsVDFpFitRQvkpKyOJNTs1DIze8HSlsbhmLq1b61MX1mqnlnrb8Xzy-tzvSv6nV4EG_cq_kv2Hnq1AgQBp6F-4Qg_D9a_6mAQ-YQ5xKAk8wrwi4aigHiLFJy6dI4FLHA9SmNsm1EOS2yDURJ5FGC0SLuchFOdBduAJEnU5NXCIayDP41CuOYckSlwVMVWAQnTqI0nUAaGgkKsilGMTphXoDhPoJq-1qDuzzoDqRATILHXoqmKYDTB8XtL1sRlwI2AhFiBwUJGZFbKiHlaPN7UmGKwUrd7cYGmtbCdFcUJpKEfrWbCyCLdQyouLt3ZiAneLn7W6nY3nvWoULrWKY9B-V1lsFcOi2HgRX5iWqwTju7BF_JlyGbF_Gy5iTRDOmgrtQvhnPbtuWaCl8pDcy57DTxva8_DjzmTLonUv4WdW7-pM5ZKzvMrLoj8lsCIZd0hwUefbVPxE65vgVEsGk1-vWcUWmI3aqddImPJ50IFq2gYVx2baqcsl07QNWtN6wWCw-cs79ExcCEb49-LNenxOo8AQ5NbRJuhznS1QAy91ejyWBUyINd2ueU26UhhlsiNG2HVpqt29IV0tA57Ixfp3U2pOPh3mw8N8_22-dyyPvx5Pdt7L4x_98cHu-OCnnA_7k8FQzt8NT773850j-eR4NB7tyuMPo3zvaNIfyfnnj5PBFznv7-f7A2R_G9ySmgtB01tUirAkSpuZmgKiAZ06ZZEd62YaR2APNVhiWpllG5atWxnogpFBozilqm1noK_TltayIl2PEjWOmXFbmu50O607kqxrEWWqmiSGZpqmnkUwb6Y009RGEjE1zmakWZRT2ElfvQjP6eAZ6SEIMGwnmyF3E8__N7pheysEY_hJyL0jGDoFlEq-4UvhwyZcWfUD9OwF9sDdP93mnnSZU8TO431penur13oAuvh2PItP1i-JkbQz
linkProvider	ISSN International Centre
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%ED%9A%A8%EC%9C%A8%EC%A0%81%EC%9D%B8+%EB%AC%B8%ED%97%8C+%EB%B6%84%EB%A5%98%EB%A5%BC+%EC%9C%84%ED%95%9C+%EC%8B%9C%EA%B3%84%EC%97%B4+%EA%B8%B0%EB%B0%98+%EB%8D%B0%EC%9D%B4%ED%84%B0+%EC%A7%91%ED%95%A9+%EC%84%A0%EC%A0%95+%EA%B8%B0%EB%B2%95&rft.jtitle=%ED%95%9C%EA%B5%AD%EC%BD%98%ED%85%90%EC%B8%A0%ED%95%99%ED%9A%8C%EB%85%BC%EB%AC%B8%EC%A7%80&rft.au=%EC%B1%84%EC%98%81%ED%9B%88&rft.au=%EC%A0%95%EB%8F%84%ED%97%8C&rft.au=Chae%2C+Yeonghun&rft.au=Jeong%2C+Do-Heon&rft.date=2017&rft.issn=1598-4877&rft.volume=17&rft.issue=1&rft.spage=39&rft.epage=49&rft_id=info:doi/10.5392%2FJKCA.2017.17.01.039&rft.externalDBID=n%2Fa&rft.externalDocID=JAKO201711656576981
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1598-4877&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1598-4877&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1598-4877&client=summon