Information extraction from clinical texts in Russian

A. O. Shelmanov, I. V. Smirnov, E. A. Vishneva

Research output: Contribution to journalConference articlepeer-review

12 Citations (Scopus)


We present and evaluate the pipeline for processing of clinical notes in Russian. The paper addresses the tasks of drug identification and disease template filling, which are related to entity recognition and relation extraction. The disease template filling consists in recognition of disease mentions in text, mapping them to concepts of a thesaurus, and discovering their attributes. Discovering attributes means identifying corresponding spans in text, linking them to diseases, and normalizing them i.e. determining their generalized meaning from a predefined set. We implemented tools for determining the following attributes of disease mentions: Negation; the flag indicating the disease mention is not related to a patient; severity; course; and body site. For different tasks, we used different techniques: Rule-based patterns and several supervised machine-learning methods. Since there were no annotated corpora of clinical notes in the Russian language available for research purposes, we annotated a dataset, which we used for training and evaluation of the developed tools. The created corpus is available for researchers through the data use agreement.

Original languageEnglish
Pages (from-to)560-572
Number of pages13
JournalKomp'juternaja Lingvistika i Intellektual'nye Tehnologii
Issue number14
Publication statusPublished - 2015
Externally publishedYes
EventInternational Conference on Computational Linguistics and Intellectual Technologies, Dialogue 2015 - Moscow, Russian Federation
Duration: 27 May 201530 May 2015


  • Annotated corpus
  • Clinical narrative
  • Clinical text processing
  • Disease template filling
  • HER
  • Information extraction
  • Medical text


Dive into the research topics of 'Information extraction from clinical texts in Russian'. Together they form a unique fingerprint.

Cite this