Information content measures of semantic similarity perform better without sense-tagged text

Ted Pedersen

Information content measures of semantic similarity perform better without sense-tagged text

Ted Pedersen

Computer Science (Duluth)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

53 Scopus citations

Abstract

This paper presents an empirical comparison of similarity measures for pairs of concepts based on Information Content. It shows that using modest amounts of untagged text to derive Information Content results in higher correlation with human similarity judgments than using the largest available corpus of manually annotated sense-tagged text.

Original language	English (US)
Title of host publication	NAACL HLT 2010 - Human Language Technologies
Subtitle of host publication	The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference
Pages	329-332
Number of pages	4
State	Published - 2010
Event	2010 Human Language Technologies Conference ofthe North American Chapter of the Association for Computational Linguistics, NAACL HLT 2010 - Los Angeles, CA, United States Duration: Jun 2 2010 → Jun 4 2010

Publication series

Name	NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference

Other

Other	2010 Human Language Technologies Conference ofthe North American Chapter of the Association for Computational Linguistics, NAACL HLT 2010
Country/Territory	United States
City	Los Angeles, CA
Period	6/2/10 → 6/4/10

OpenUrl availability

Full text

Cite this

Pedersen, T. (2010). Information content measures of semantic similarity perform better without sense-tagged text. In NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference (pp. 329-332). (NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference).

Information content measures of semantic similarity perform better without sense-tagged text. / Pedersen, Ted.
NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference. 2010. p. 329-332 (NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Pedersen, T 2010, Information content measures of semantic similarity perform better without sense-tagged text. in NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference. NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference, pp. 329-332, 2010 Human Language Technologies Conference ofthe North American Chapter of the Association for Computational Linguistics, NAACL HLT 2010, Los Angeles, CA, United States, 6/2/10.

Pedersen T. Information content measures of semantic similarity perform better without sense-tagged text. In NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference. 2010. p. 329-332. (NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference).

Pedersen, Ted. / Information content measures of semantic similarity perform better without sense-tagged text. NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference. 2010. pp. 329-332 (NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference).

@inproceedings{e412eabe6523418f927e02b33be57bcc,

title = "Information content measures of semantic similarity perform better without sense-tagged text",

abstract = "This paper presents an empirical comparison of similarity measures for pairs of concepts based on Information Content. It shows that using modest amounts of untagged text to derive Information Content results in higher correlation with human similarity judgments than using the largest available corpus of manually annotated sense-tagged text.",

author = "Ted Pedersen",

year = "2010",

language = "English (US)",

isbn = "1932432655",

series = "NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference",

pages = "329--332",

booktitle = "NAACL HLT 2010 - Human Language Technologies",

note = "2010 Human Language Technologies Conference ofthe North American Chapter of the Association for Computational Linguistics, NAACL HLT 2010 ; Conference date: 02-06-2010 Through 04-06-2010",

}

TY - GEN

T1 - Information content measures of semantic similarity perform better without sense-tagged text

AU - Pedersen, Ted

PY - 2010

Y1 - 2010

N2 - This paper presents an empirical comparison of similarity measures for pairs of concepts based on Information Content. It shows that using modest amounts of untagged text to derive Information Content results in higher correlation with human similarity judgments than using the largest available corpus of manually annotated sense-tagged text.

AB - This paper presents an empirical comparison of similarity measures for pairs of concepts based on Information Content. It shows that using modest amounts of untagged text to derive Information Content results in higher correlation with human similarity judgments than using the largest available corpus of manually annotated sense-tagged text.

UR - http://www.scopus.com/inward/record.url?scp=84858422324&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858422324&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84858422324

SN - 1932432655

SN - 9781932432657

T3 - NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference

SP - 329

EP - 332

BT - NAACL HLT 2010 - Human Language Technologies

T2 - 2010 Human Language Technologies Conference ofthe North American Chapter of the Association for Computational Linguistics, NAACL HLT 2010

Y2 - 2 June 2010 through 4 June 2010

ER -

Information content measures of semantic similarity perform better without sense-tagged text

Abstract

Publication series

Other

OpenUrl availability

Other files and links

Fingerprint

Cite this