SkoltechNLP at SemEval-2020 Task 11: Exploring Unsupervised Text Augmentation for Propaganda Detection

Daryna Dementieva, Igor Markov, Alexander Panchenko

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

This paper presents a solution for the Span Identification (SI) task in the “Detection of Propaganda Techniques in News Articles” competition at SemEval-2020. The goal of the SI task is to identify specific fragments of each article which contain the use of at least one propaganda technique. This is a binary sequence tagging task. We tested several approaches finally selecting a fine-tuned BERT model as our baseline model. Our main contribution is an investigation of several unsupervised data augmentation techniques based on distributional semantics expanding the original small training dataset as applied to this BERT-based sequence tagger. We explore various expansion strategies and show that they can substantially shift the balance between precision and recall, while maintaining comparable levels of the F1 score.

Original languageEnglish
Title of host publication14th International Workshops on Semantic Evaluation, SemEval 2020 - co-located 28th International Conference on Computational Linguistics, COLING 2020, Proceedings
EditorsAurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
PublisherInternational Committee for Computational Linguistics
Pages1786-1792
Number of pages7
ISBN (Electronic)9781952148316
Publication statusPublished - 2020
Event14th International Workshops on Semantic Evaluation, SemEval 2020 - Barcelona, Spain
Duration: 12 Dec 202013 Dec 2020

Publication series

Name14th International Workshops on Semantic Evaluation, SemEval 2020 - co-located 28th International Conference on Computational Linguistics, COLING 2020, Proceedings

Conference

Conference14th International Workshops on Semantic Evaluation, SemEval 2020
Country/TerritorySpain
CityBarcelona
Period12/12/2013/12/20

Fingerprint

Dive into the research topics of 'SkoltechNLP at SemEval-2020 Task 11: Exploring Unsupervised Text Augmentation for Propaganda Detection'. Together they form a unique fingerprint.

Cite this