TY - GEN
T1 - The effect of different context representations on word sense discrimination in biomedical texts
AU - Pedersen, Ted
PY - 2010/12/1
Y1 - 2010/12/1
N2 - Unsupervised word sense discrimination relies on the idea that words that occur in similar contexts will have similar meanings. These techniques cluster multiple contexts in which an ambiguous word occurs, and the number of clusters discovered indicates the number of senses in which the ambiguous word is used. One important distinction among these methods is the underlying means of representing the contexts to be clustered. This paper compares the efficacy of first-order methods that directly represent the features that occur in a context with several second-order methods that use a more indirect representation. The experiments in this paper show that second order methods that use word by word co-occurrence matrices result in the highest accuracy and most robust word sense discrimination. These experiments were conducted on MedLine abstracts that contained pseudo - words created by conflating together pairs of MeSH preferred terms to create new ambiguous words. The experiments were carried out with SenseClusters, a freely available open source software package.
AB - Unsupervised word sense discrimination relies on the idea that words that occur in similar contexts will have similar meanings. These techniques cluster multiple contexts in which an ambiguous word occurs, and the number of clusters discovered indicates the number of senses in which the ambiguous word is used. One important distinction among these methods is the underlying means of representing the contexts to be clustered. This paper compares the efficacy of first-order methods that directly represent the features that occur in a context with several second-order methods that use a more indirect representation. The experiments in this paper show that second order methods that use word by word co-occurrence matrices result in the highest accuracy and most robust word sense discrimination. These experiments were conducted on MedLine abstracts that contained pseudo - words created by conflating together pairs of MeSH preferred terms to create new ambiguous words. The experiments were carried out with SenseClusters, a freely available open source software package.
KW - natural language processing
KW - semantic ambiguity
KW - word sense discrimination
UR - http://www.scopus.com/inward/record.url?scp=78650949174&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78650949174&partnerID=8YFLogxK
U2 - 10.1145/1882992.1883003
DO - 10.1145/1882992.1883003
M3 - Conference contribution
AN - SCOPUS:78650949174
SN - 9781450300308
T3 - IHI'10 - Proceedings of the 1st ACM International Health Informatics Symposium
SP - 56
EP - 65
BT - IHI'10 - Proceedings of the 1st ACM International Health Informatics Symposium
T2 - 1st ACM International Health Informatics Symposium, IHI'10
Y2 - 11 November 2010 through 12 November 2010
ER -