Unsupervised discrimination of person names in Web contexts

Ted Pedersen, Anagha Kulkarni

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

Ambiguous person names are a problem in many forms of written text, including that which is found on the Web. In this paper we explore the use of unsupervised clustering techniques to discriminate among entities named in Web pages. We examine three main issues via an extensive experimental study. First, the effect of using a held-out set of training data for feature selection versus using the data in which the ambiguous names occur. Second, the impact of using different measures of association for identifying lexical features. Third, the success of different cluster stopping measures that automatically determine the number of clusters in the data.

Original languageEnglish (US)
Title of host publicationComputational Linguistics and Intelligent Text Processing - 8th International Conference, CICLing 2007, Proceedings
Pages299-310
Number of pages12
StatePublished - Dec 20 2007
Externally publishedYes
Event8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007 - Mexico City, Mexico
Duration: Feb 18 2007Feb 24 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4394 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007
CountryMexico
CityMexico City
Period2/18/072/24/07

Fingerprint Dive into the research topics of 'Unsupervised discrimination of person names in Web contexts'. Together they form a unique fingerprint.

Cite this