Semiautomatic Extraction of Topic Maps from Web Pages Using Clustering with Web Contents and Structure

In this paper, we describe a method to semi-automatically extract Topic Maps from a set of Web pages. We introduce the following two points to the existing clustering method: The first is merging only the linked Web pages, to extract the underlying relationship of the topics. The second is introduci...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops pp. 208 - 211
Main Authors	Mase, Motohiro, Yamada, Seiji, Nitta, Katsumi
Format	Conference Proceeding
Language	English
Published	Washington, DC, USA IEEE Computer Society 02.11.2007
Series	ACM Conferences
Subjects	Applied computing > Document management and text processing > Document preparation > Multi > mixed media creation Computing methodologies > Machine learning > Learning paradigms > Unsupervised learning > Cluster analysis Human-centered computing > Human computer interaction (HCI) > Interaction paradigms > Hypertext > hypermedia Information systems > Information retrieval Information systems > Information storage systems information extractionTopic Mapsclustering
Online Access	Get full text
ISBN	0769530281 9780769530284
DOI	10.5555/1339264.1339692

Cover

More Information
Summary:	In this paper, we describe a method to semi-automatically extract Topic Maps from a set of Web pages. We introduce the following two points to the existing clustering method: The first is merging only the linked Web pages, to extract the underlying relationship of the topics. The second is introducing the similarity by contents of Web pages and the types of links, and the distance between the directories in which the pages are located, to generate dense clusters. We generate the topic map by assuming the clusters as topics, the edges as associations, the Web pages related to the topic as occurrences from the result of clustering. We experimentally extracted the topic map and evaluated it.
ISBN:	0769530281 9780769530284
DOI:	10.5555/1339264.1339692