Unsupervised discrimination of person names in Web contexts

Ted Pedersen; Anagha Kulkarni

doi:10.1007/978-3-540-70939-8_27

Unsupervised discrimination of person names in Web contexts

Ted Pedersen, Anagha Kulkarni

Computer Science (Duluth)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

11 Scopus citations

Abstract

Ambiguous person names are a problem in many forms of written text, including that which is found on the Web. In this paper we explore the use of unsupervised clustering techniques to discriminate among entities named in Web pages. We examine three main issues via an extensive experimental study. First, the effect of using a held-out set of training data for feature selection versus using the data in which the ambiguous names occur. Second, the impact of using different measures of association for identifying lexical features. Third, the success of different cluster stopping measures that automatically determine the number of clusters in the data.

Original language	English (US)
Title of host publication	Computational Linguistics and Intelligent Text Processing - 8th International Conference, CICLing 2007, Proceedings
Publisher	Springer Verlag
Pages	299-310
Number of pages	12
ISBN (Print)	354070938X, 9783540709381
DOIs	https://doi.org/10.1007/978-3-540-70939-8_27
State	Published - 2007
Event	8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007 - Mexico City, Mexico Duration: Feb 18 2007 → Feb 24 2007

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	4394 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Other

Other	8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007
Country/Territory	Mexico
City	Mexico City
Period	2/18/07 → 2/24/07

Access

10.1007/978-3-540-70939-8_27

OpenUrl availability

Full text

Cite this

Pedersen, T., & Kulkarni, A. (2007). Unsupervised discrimination of person names in Web contexts. In Computational Linguistics and Intelligent Text Processing - 8th International Conference, CICLing 2007, Proceedings (pp. 299-310). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4394 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-540-70939-8_27

Unsupervised discrimination of person names in Web contexts. / Pedersen, Ted; Kulkarni, Anagha.
Computational Linguistics and Intelligent Text Processing - 8th International Conference, CICLing 2007, Proceedings. Springer Verlag, 2007. p. 299-310 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4394 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Pedersen, T & Kulkarni, A 2007, Unsupervised discrimination of person names in Web contexts. in Computational Linguistics and Intelligent Text Processing - 8th International Conference, CICLing 2007, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4394 LNCS, Springer Verlag, pp. 299-310, 8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007, Mexico City, Mexico, 2/18/07. https://doi.org/10.1007/978-3-540-70939-8_27

Pedersen T, Kulkarni A. Unsupervised discrimination of person names in Web contexts. In Computational Linguistics and Intelligent Text Processing - 8th International Conference, CICLing 2007, Proceedings. Springer Verlag. 2007. p. 299-310. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-540-70939-8_27

Pedersen, Ted ; Kulkarni, Anagha. / Unsupervised discrimination of person names in Web contexts. Computational Linguistics and Intelligent Text Processing - 8th International Conference, CICLing 2007, Proceedings. Springer Verlag, 2007. pp. 299-310 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{bf85a97419e942b69f072bb3f0fc539f,

title = "Unsupervised discrimination of person names in Web contexts",

abstract = "Ambiguous person names are a problem in many forms of written text, including that which is found on the Web. In this paper we explore the use of unsupervised clustering techniques to discriminate among entities named in Web pages. We examine three main issues via an extensive experimental study. First, the effect of using a held-out set of training data for feature selection versus using the data in which the ambiguous names occur. Second, the impact of using different measures of association for identifying lexical features. Third, the success of different cluster stopping measures that automatically determine the number of clusters in the data.",

author = "Ted Pedersen and Anagha Kulkarni",

year = "2007",

doi = "10.1007/978-3-540-70939-8_27",

language = "English (US)",

isbn = "354070938X",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "299--310",

booktitle = "Computational Linguistics and Intelligent Text Processing - 8th International Conference, CICLing 2007, Proceedings",

note = "8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007 ; Conference date: 18-02-2007 Through 24-02-2007",

}

TY - GEN

T1 - Unsupervised discrimination of person names in Web contexts

AU - Pedersen, Ted

AU - Kulkarni, Anagha

PY - 2007

Y1 - 2007

N2 - Ambiguous person names are a problem in many forms of written text, including that which is found on the Web. In this paper we explore the use of unsupervised clustering techniques to discriminate among entities named in Web pages. We examine three main issues via an extensive experimental study. First, the effect of using a held-out set of training data for feature selection versus using the data in which the ambiguous names occur. Second, the impact of using different measures of association for identifying lexical features. Third, the success of different cluster stopping measures that automatically determine the number of clusters in the data.

AB - Ambiguous person names are a problem in many forms of written text, including that which is found on the Web. In this paper we explore the use of unsupervised clustering techniques to discriminate among entities named in Web pages. We examine three main issues via an extensive experimental study. First, the effect of using a held-out set of training data for feature selection versus using the data in which the ambiguous names occur. Second, the impact of using different measures of association for identifying lexical features. Third, the success of different cluster stopping measures that automatically determine the number of clusters in the data.

UR - http://www.scopus.com/inward/record.url?scp=37149024114&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=37149024114&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-70939-8_27

DO - 10.1007/978-3-540-70939-8_27

M3 - Conference contribution

AN - SCOPUS:37149024114

SN - 354070938X

SN - 9783540709381

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 299

EP - 310

BT - Computational Linguistics and Intelligent Text Processing - 8th International Conference, CICLing 2007, Proceedings

PB - Springer Verlag

T2 - 8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007

Y2 - 18 February 2007 through 24 February 2007

ER -

Unsupervised discrimination of person names in Web contexts

Abstract

Publication series

Other

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this