Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Word-sense disambiguation
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Difficulties== ===Differences between dictionaries=== One problem with word sense disambiguation is deciding what the senses are, as different [[dictionary|dictionaries]] and [[thesaurus]]es will provide different divisions of words into senses. Some researchers have suggested choosing a particular dictionary, and using its set of senses to deal with this issue. Generally, however, research results using broad distinctions in senses have been much better than those using narrow ones.{{sfn|Navigli|Litkowski|Hargraves|2007|pp=30β35}}{{sfn|Pradhan|Loper|Dligach|Palmer|2007|pp=87β92}} Most researchers continue to work on [[fine-grained]] WSD. Most research in the field of WSD is performed by using [[WordNet]] as a reference sense inventory for English. WordNet is a computational [[lexicon]] that encodes concepts as [[synonym]] sets (e.g. the concept of car is encoded as { car, auto, automobile, machine, motorcar }). Other resources used for disambiguation purposes include [[Roget's Thesaurus]]{{sfn|Yarowsky|1992|pp=454β460}} and [[Wikipedia]].{{sfn|Mihalcea|2007|pp=}} More recently, [[BabelNet]], a multilingual encyclopedic dictionary, has been used for multilingual WSD.<ref>A. Moro; A. Raganato; R. Navigli. [http://www.transacl.org/wp-content/uploads/2014/05/54.pdf Entity Linking meets Word Sense Disambiguation: a Unified Approach]. {{Webarchive|url=https://web.archive.org/web/20140808063116/http://www.transacl.org/wp-content/uploads/2014/05/54.pdf |date=2014-08-08 }}. Transactions of the [[Association for Computational Linguistics]] (TACL). 2. pp. 231β244. 2014.</ref> ===Part-of-speech tagging=== In any real test, [[part-of-speech tagging]] and sense tagging have proven to be very closely related, with each potentially imposing constraints upon the other. The question whether these tasks should be kept together or decoupled is still not unanimously resolved, but recently scientists incline to test these things separately (e.g. in the Senseval/[[SemEval]] competitions parts of speech are provided as input for the text to disambiguate). Both WSD and part-of-speech tagging involve disambiguating or tagging with words. However, algorithms used for one do not tend to work well for the other, mainly because the part of speech of a word is primarily determined by the immediately adjacent one to three words, whereas the sense of a word may be determined by words further away. The success rate for part-of-speech tagging algorithms is at present much higher than that for WSD, state-of-the art being around 96%<ref>{{Cite journal|last=Martinez|first=Angel R.|date=January 2012|title=Part-of-speech tagging: Part-of-speech tagging|url=http://doi.wiley.com/10.1002/wics.195|journal=Wiley Interdisciplinary Reviews: Computational Statistics|language=en|volume=4|issue=1|pages=107β113|doi=10.1002/wics.195|s2cid=62672734|access-date=2021-04-01|archive-date=2023-07-15|archive-url=https://web.archive.org/web/20230715100019/https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/wics.195|url-status=live}}</ref> accuracy or better, as compared to less than 75%{{Citation needed|date=March 2014}} accuracy in word sense disambiguation with [[supervised learning]]. These figures are typical for English, and may be very different from those for other languages. ===Inter-judge variance=== Another problem is [[Inter-rater reliability|inter-judge]] [[variance]]. WSD systems are normally tested by having their results on a task compared against those of a human. However, while it is relatively easy to assign parts of speech to text, training people to tag senses has been proven to be far more difficult.{{sfn|Fellbaum|1997|pp=}} While users can memorize all of the possible parts of speech a word can take, it is often impossible for individuals to memorize all of the senses a word can take. Moreover, humans do not agree on the task at hand β give a list of senses and sentences, and humans will not always agree on which word belongs in which sense.{{sfn|Snyder|Palmer|2004|pp=41β43}} As human performance serves as the standard, it is an [[upper bound]] for computer performance. Human performance, however, is much better on [[coarse-grained]] than [[fine-grained]] distinctions, so this again is why research on coarse-grained distinctions{{sfn|Navigli|2006|pp=105β112}}{{sfn|Snow|Prakash|Jurafsky|Ng|2007|pp=1005β1014}} has been put to test in recent WSD evaluation exercises.{{sfn|Navigli|Litkowski|Hargraves|2007|pp=30β35}}{{sfn|Pradhan|Loper|Dligach|Palmer|2007|pp=87β92}} ===Sense inventory and algorithms' task-dependency=== A task-independent sense inventory is not a coherent concept:{{sfn|Palmer|Babko-Malaya|Dang|2004|pp=49β56}} each task requires its own division of word meaning into senses relevant to the task. Additionally, completely different algorithms might be required by different applications. In machine translation, the problem takes the form of target word selection. The "senses" are words in the target language, which often correspond to significant meaning distinctions in the source language ("bank" could translate to the French {{Lang|fr|banque}} β that is, 'financial bank' or {{Lang|fr|rive}} β that is, 'edge of river'). In information retrieval, a sense inventory is not necessarily required, because it is enough to know that a word is used in the same sense in the query and a retrieved document; what sense that is, is unimportant. ===Discreteness of senses=== Finally, the very notion of "[[word sense]]" is slippery and controversial. Most people can agree in distinctions at the [[coarse-grained]] [[homograph]] level (e.g., pen as writing instrument or enclosure), but go down one level to [[fine-grained]] [[polysemy]], and disagreements arise. For example, in Senseval-2, which used fine-grained sense distinctions, human annotators agreed in only 85% of word occurrences.{{sfn|Edmonds|2000|pp=}} Word meaning is in principle infinitely variable and context-sensitive. It does not divide up easily into distinct or discrete sub-meanings.{{sfn|Kilgarrif|1997|pp=91β113}} [[Lexicography|Lexicographers]] frequently discover in corpora loose and overlapping word meanings, and standard or conventional meanings extended, modulated, and exploited in a bewildering variety of ways. The art of lexicography is to generalize from the corpus to definitions that evoke and explain the full range of meaning of a word, making it seem like words are well-behaved semantically. However, it is not at all clear if these same meaning distinctions are applicable in [[Computational science#Applications of computational science|computational applications]], as the decisions of lexicographers are usually driven by other considerations. In 2009, a task β named [[lexical substitution]] β was proposed as a possible solution to the sense discreteness problem.{{sfn|McCarthy|Navigli|2009|pp=139β159}} The task consists of providing a substitute for a word in context that preserves the meaning of the original word (potentially, substitutes can be chosen from the full lexicon of the target language, thus overcoming discreteness).
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Word-sense disambiguation
(section)
Add topic