Active learning with adaptive density weighted sampling for information extraction from scientific papers

Roman Suvorov, Artem Shelmanov, Ivan Smirnov

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

The paper addresses the task of information extraction from scientific literature with machine learning methods. In particular, the tasks of definition and result extraction from scientific publications in Russian are considered. We note that annotation of scientific texts for creation of training dataset is very labor insensitive and expensive process. To tackle this problem, we propose methods and tools based on active learning. We describe and evaluate a novel adaptive density-weighted sampling (ADWeS) meta-strategy for active learning. The experiments demonstrate that active learning can be a very efficient technique for scientific text mining, and the proposed meta-strategy can be beneficial for corpus annotation with strongly skewed class distribution. We also investigate informative task-independent features for information extraction from scientific texts and present an openly available tool for corpus annotation, which is equipped with ADWeS and compatible with well-known sampling strategies.

Original languageEnglish
Title of host publicationArtificial Intelligence and Natural Language - 6th Conference, AINL 2017, Revised Selected Papers
EditorsJan Zizka, Andrey Filchenkov, Lidia Pivovarova
PublisherSpringer Verlag
Pages77-90
Number of pages14
ISBN (Print)9783319717456
DOIs
Publication statusPublished - 2018
Externally publishedYes
Event6th Conference on Artificial Intelligence and Natural Language, AINL 2017 - St. Petersburg, Russian Federation
Duration: 20 Sep 201723 Sep 2017

Publication series

NameCommunications in Computer and Information Science
Volume789
ISSN (Print)1865-0929

Conference

Conference6th Conference on Artificial Intelligence and Natural Language, AINL 2017
Country/TerritoryRussian Federation
CitySt. Petersburg
Period20/09/1723/09/17

Keywords

  • Active machine learning
  • Deep linguistic analysis
  • Information extraction
  • Scientific texts analysis

Fingerprint

Dive into the research topics of 'Active learning with adaptive density weighted sampling for information extraction from scientific papers'. Together they form a unique fingerprint.

Cite this