Information content measures of semantic similarity perform better without sense-tagged text

Research output: Chapter in Book/Report/Conference proceedingConference contribution

49 Scopus citations

Abstract

This paper presents an empirical comparison of similarity measures for pairs of concepts based on Information Content. It shows that using modest amounts of untagged text to derive Information Content results in higher correlation with human similarity judgments than using the largest available corpus of manually annotated sense-tagged text.

Original languageEnglish (US)
Title of host publicationNAACL HLT 2010 - Human Language Technologies
Subtitle of host publicationThe 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference
Pages329-332
Number of pages4
StatePublished - Dec 1 2010
Externally publishedYes
Event2010 Human Language Technologies Conference ofthe North American Chapter of the Association for Computational Linguistics, NAACL HLT 2010 - Los Angeles, CA, United States
Duration: Jun 2 2010Jun 4 2010

Publication series

NameNAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference

Other

Other2010 Human Language Technologies Conference ofthe North American Chapter of the Association for Computational Linguistics, NAACL HLT 2010
Country/TerritoryUnited States
CityLos Angeles, CA
Period6/2/106/4/10

Fingerprint

Dive into the research topics of 'Information content measures of semantic similarity perform better without sense-tagged text'. Together they form a unique fingerprint.

Cite this