The paper considers the task of automatic discourse parsing of texts in Russian. Discourse parsing is a well-known approach to capturing text semantics across boundaries of single sentences. Discourse annotation was found to be useful for various tasks including summarization, sentiment analysis, question-answering. Recently, the release of manually annotated Ru-RSTreebank corpus unlocked the possibility of leveraging supervised machine learning techniques for creating such parsers for Russian language. The corpus provides the discourse annotation in a widely adopted formalisation—Rhetorical Structure Theory. In this work, we develop feature sets for rhetorical relation classification in Russian-language texts, investigate importance of various types of features, and report results of the first experimental evaluation of machine learning models trained on Ru-RSTreebank corpus. We consider various machine learning methods including gradient boosting, neural network, and ensembling of several models by soft voting.
|Translated title of the contribution||Classification models for rsT discourse parsing of texts in Russian|
|Number of pages||14|
|Journal||Komp'juternaja Lingvistika i Intellektual'nye Tehnologii|
|Publication status||Published - 2019|
|Event||2019 Annual International Conference on Computational Linguistics and Intellectual Technologies, Dialogue 2019 - Moscow, Russian Federation|
Duration: 29 May 2019 → 1 Jun 2019