Word sense disambiguation for 158 languages using word embeddings only

Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto, Alexander Panchenko

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Disambiguation of word senses in context is easy for humans, but is a major challenge for automatic approaches. Sophisticated supervised and knowledge-based models were developed to solve this task. However, (i) the inherent Zipfian distribution of supervised training instances for a given word and/or (ii) the quality of linguistic knowledge representations motivate the development of completely unsupervised and knowledge-free approaches to word sense disambiguation (WSD). They are particularly useful for under-resourced languages which do not have any resources for building either supervised and/or knowledge-based models. In this paper, we present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory, which can be used for disambiguation in context. We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages. Models and system are available online.

Original languageEnglish
Title of host publicationLREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings
EditorsNicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
PublisherEuropean Language Resources Association (ELRA)
Pages5943-5952
Number of pages10
ISBN (Electronic)9791095546344
Publication statusPublished - 2020
Event12th International Conference on Language Resources and Evaluation, LREC 2020 - Marseille, France
Duration: 11 May 202016 May 2020

Publication series

NameLREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings

Conference

Conference12th International Conference on Language Resources and Evaluation, LREC 2020
Country/TerritoryFrance
CityMarseille
Period11/05/2016/05/20

Keywords

  • Graph clustering
  • Sense embeddings
  • Word embeddings
  • Word sense disambiguation
  • Word sense induction

Fingerprint

Dive into the research topics of 'Word sense disambiguation for 158 languages using word embeddings only'. Together they form a unique fingerprint.

Cite this