Two supervised learning approaches for name disambiguation in author citations

Due to name abbreviations, identical names, name misspellings, and pseudonyms inpublications or bibliographies (citations), an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, web search, database integrat...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries pp. 296 - 305
Main Authors	Han, Hui, Giles, Lee, Zha, Hongyuan, Li, Cheng, Tsioutsiouliklis, Kostas
Format	Conference Proceeding
Language	English
Published	New York, NY, USA ACM 07.06.2004 IEEE ACM Press
Series	ACM Conferences
Subjects	Applied sciences Artificial intelligence Bibliographies Computer science Computer science; control theory; systems Exact sciences and technology Information retrieval Information systems > Information retrieval Information systems. Data bases Memory organisation. Data processing Permission Public healthcare Software Software libraries Statistics Supervised learning Web search naive bayes name disambiguation support vector machine Abbreviation Bayes estimation Data type Information integration Statistical analysis Disambiguation Probabilistic approach Electronic document Information retrieval Modeling Document retrieval Discrimination Vector space Supervised learning World wide web Database Vector support machine Internet Electronic library Bibliography Ambiguity
Online Access	Get full text
ISBN	1581138326 9781581138320
DOI	10.1145/996350.996419

Cover

More Information
Summary:	Due to name abbreviations, identical names, name misspellings, and pseudonyms inpublications or bibliographies (citations), an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, web search, database integration, and may cause improper attribution to authors. This paper investigates two supervised learning approaches to disambiguate authors in the citations. One approach uses the naive Bayes probability model, a generative model; the other uses Support Vector Machines(SVMs) and the vector space representation of citations, a discriminative model. Both approaches utilize three types of citation attributes: co-author names, the title of the paper, and the title of the journal or proceeding. We illustrate these two approaches on two types of data, one collected from the web, mainly publication lists from homepages, the other collected from the DBLPcitation databases.
Bibliography:	SourceType-Conference Papers & Proceedings-1 ObjectType-Conference Paper-1 content type line 25
ISBN:	1581138326 9781581138320
DOI:	10.1145/996350.996419