Editing Word-sense disambiguation (section)

==Difficulties==

===Differences between dictionaries===
One problem with word sense disambiguation is deciding what the senses are, as different [[dictionary|dictionaries]] and [[thesaurus]]es will provide different divisions of words into senses. Some researchers have suggested choosing a particular dictionary, and using its set of senses to deal with this issue. Generally, however, research results using broad distinctions in senses have been much better than those using narrow ones.{{sfn|Navigli|Litkowski|Hargraves|2007|pp=30–35}}{{sfn|Pradhan|Loper|Dligach|Palmer|2007|pp=87–92}} Most researchers continue to work on [[fine-grained]] WSD.

Most research in the field of WSD is performed by using [[WordNet]] as a reference sense inventory for English. WordNet is a computational [[lexicon]] that encodes concepts as [[synonym]] sets (e.g. the concept of car is encoded as { car, auto, automobile, machine, motorcar }). Other resources used for disambiguation purposes include [[Roget's Thesaurus]]{{sfn|Yarowsky|1992|pp=454–460}} and [[Wikipedia]].{{sfn|Mihalcea|2007|pp=}} More recently, [[BabelNet]], a multilingual encyclopedic dictionary, has been used for multilingual WSD.<ref>A. Moro; A. Raganato; R. Navigli. [http://www.transacl.org/wp-content/uploads/2014/05/54.pdf Entity Linking meets Word Sense Disambiguation: a Unified Approach]. {{Webarchive|url=https://web.archive.org/web/20140808063116/http://www.transacl.org/wp-content/uploads/2014/05/54.pdf |date=2014-08-08 }}. Transactions of the [[Association for Computational Linguistics]] (TACL). 2. pp. 231–244. 2014.</ref>

===Part-of-speech tagging===
In any real test, [[part-of-speech tagging]] and sense tagging have proven to be very closely related, with each potentially imposing constraints upon the other. The question whether these tasks should be kept together or decoupled is still not unanimously resolved, but recently scientists incline to test these things separately (e.g. in the Senseval/[[SemEval]] competitions parts of speech are provided as input for the text to disambiguate).

Both WSD and part-of-speech tagging involve disambiguating or tagging with words. However, algorithms used for one do not tend to work well for the other, mainly because the part of speech of a word is primarily determined by the immediately adjacent one to three words, whereas the sense of a word may be determined by words further away. The success rate for part-of-speech tagging algorithms is at present much higher than that for WSD, state-of-the art being around 96%<ref>{{Cite journal|last=Martinez|first=Angel R.|date=January 2012|title=Part-of-speech tagging: Part-of-speech tagging|url=http://doi.wiley.com/10.1002/wics.195|journal=Wiley Interdisciplinary Reviews: Computational Statistics|language=en|volume=4|issue=1|pages=107–113|doi=10.1002/wics.195|s2cid=62672734|access-date=2021-04-01|archive-date=2023-07-15|archive-url=https://web.archive.org/web/20230715100019/https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/wics.195|url-status=live}}</ref> accuracy or better, as compared to less than 75%{{Citation needed|date=March 2014}} accuracy in word sense disambiguation with [[supervised learning]]. These figures are typical for English, and may be very different from those for other languages.

===Inter-judge variance===
Another problem is [[Inter-rater reliability|inter-judge]] [[variance]]. WSD systems are normally tested by having their results on a task compared against those of a human. However, while it is relatively easy to assign parts of speech to text, training people to tag senses has been proven to be far more difficult.{{sfn|Fellbaum|1997|pp=}} While users can memorize all of the possible parts of speech a word can take, it is often impossible for individuals to memorize all of the senses a word can take. Moreover, humans do not agree on the task at hand – give a list of senses and sentences, and humans will not always agree on which word belongs in which sense.{{sfn|Snyder|Palmer|2004|pp=41–43}}

As human performance serves as the standard, it is an [[upper bound]] for computer performance. Human performance, however, is much better on [[coarse-grained]] than [[fine-grained]] distinctions, so this again is why research on coarse-grained distinctions{{sfn|Navigli|2006|pp=105–112}}{{sfn|Snow|Prakash|Jurafsky|Ng|2007|pp=1005–1014}} has been put to test in recent WSD evaluation exercises.{{sfn|Navigli|Litkowski|Hargraves|2007|pp=30–35}}{{sfn|Pradhan|Loper|Dligach|Palmer|2007|pp=87–92}}

===Sense inventory and algorithms' task-dependency===
A task-independent sense inventory is not a coherent concept:{{sfn|Palmer|Babko-Malaya|Dang|2004|pp=49–56}} each task requires its own division of word meaning into senses relevant to the task. Additionally, completely different algorithms might be required by different applications. In machine translation, the problem takes the form of target word selection. The "senses" are words in the target language, which often correspond to significant meaning distinctions in the source language ("bank" could translate to the French {{Lang|fr|banque}} – that is, 'financial bank' or {{Lang|fr|rive}} – that is, 'edge of river'). In information retrieval, a sense inventory is not necessarily required, because it is enough to know that a word is used in the same sense in the query and a retrieved document; what sense that is, is unimportant.

===Discreteness of senses===
Finally, the very notion of "[[word sense]]" is slippery and controversial. Most people can agree in distinctions at the [[coarse-grained]] [[homograph]] level (e.g., pen as writing instrument or enclosure), but go down one level to [[fine-grained]] [[polysemy]], and disagreements arise. For example, in Senseval-2, which used fine-grained sense distinctions, human annotators agreed in only 85% of word occurrences.{{sfn|Edmonds|2000|pp=}} Word meaning is in principle infinitely variable and context-sensitive. It does not divide up easily into distinct or discrete sub-meanings.{{sfn|Kilgarrif|1997|pp=91–113}} [[Lexicography|Lexicographers]] frequently discover in corpora loose and overlapping word meanings, and standard or conventional meanings extended, modulated, and exploited in a bewildering variety of ways. The art of lexicography is to generalize from the corpus to definitions that evoke and explain the full range of meaning of a word, making it seem like words are well-behaved semantically. However, it is not at all clear if these same meaning distinctions are applicable in [[Computational science#Applications of computational science|computational applications]], as the decisions of lexicographers are usually driven by other considerations. In 2009, a task – named [[lexical substitution]] – was proposed as a possible solution to the sense discreteness problem.{{sfn|McCarthy|Navigli|2009|pp=139–159}} The task consists of providing a substitute for a word in context that preserves the meaning of the original word (potentially, substitutes can be chosen from the full lexicon of the target language, thus overcoming discreteness).