Noun compositionality detection using distributional semantics for the Russian language

Dmitry Puzyrev, Artem Shelmanov, Alexander Panchenko, Ekaterina Artemova

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    In this paper, we present the first gold-standard corpus of Russian noun compounds annotated with compositionality information. We used Universal Dependency treebanks to collect noun compounds according to part of speech patterns, such as ADJ-NOUN or NOUN-NOUN and annotated them according to the following schema: a phrase can be either compositional, non-compositional, or ambiguous (i.e., depending on the context it can be interpreted both as compositional or non-compositional). Next, we conduct a series of experiments to evaluate both unsupervised and supervised methods for predicting compositionality. To expand this manually annotated dataset with more non-compositional compounds and streamline the annotation process we use active learning. We show that not only the methods, previously proposed for English, are easily adapted for Russian, but also can be exploited in active learning paradigm, that increases the efficiency of the annotation process.

    Original languageEnglish
    Title of host publicationAnalysis of Images, Social Networks and Texts - 8th International Conference, AIST 2019, Revised Selected Papers
    EditorsWil M.P. van der Aalst, Vladimir Batagelj, Dmitry I. Ignatov, Valentina Kuskova, Sergei O. Kuznetsov, Irina A. Lomazova, Michael Khachay, Andrey Kutuzov, Natalia Loukachevitch, Amedeo Napoli, Panos M. Pardalos, Marcello Pelillo, Andrey V. Savchenko, Elena Tutubalina
    PublisherSpringer
    Pages218-229
    Number of pages12
    ISBN (Print)9783030373337
    DOIs
    Publication statusPublished - 2019
    Event8th International Conference on Analysis of Images, Social Networks and Texts, AIST 2019 - Kazan, Russian Federation
    Duration: 17 Jul 201919 Jul 2019

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume11832 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference8th International Conference on Analysis of Images, Social Networks and Texts, AIST 2019
    Country/TerritoryRussian Federation
    CityKazan
    Period17/07/1919/07/19

    Fingerprint

    Dive into the research topics of 'Noun compositionality detection using distributional semantics for the Russian language'. Together they form a unique fingerprint.

    Cite this