Gradient-Based Adversarial Attacks on Categorical Sequence Models via Traversing an Embedded World

Ivan Fursov, Alexey Zaytsev, Nikita Kluchnikov, Andrey Kravchenko, Evgeny Burnaev

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Deep learning models suffer from a phenomenon called adversarial attacks: we can apply minor changes to the model input to fool a classifier for a particular example. The literature mostly considers adversarial attacks on models with images and other structured inputs. However, the adversarial attacks for categorical sequences can also be harmful. Successful attacks for inputs in the form of categorical sequences should address the following challenges: (1) non-differentiability of the target function, (2) constraints on transformations of initial sequences, and (3) diversity of possible problems. We handle these challenges using two black-box adversarial attacks. The first approach adopts a Monte-Carlo method and allows usage in any scenario, the second approach uses a continuous relaxation of models and target metrics, and thus allows a usage of state-of-the-art methods for adversarial attacks with little additional effort. Results for money transactions, medical fraud, and NLP datasets suggest that the proposed methods generate reasonable adversarial sequences that are close to original ones, but fool machine learning models.

Original languageEnglish
Title of host publicationAnalysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Selected Papers
EditorsWil M. van der Aalst, Vladimir Batagelj, Dmitry I. Ignatov, Michael Khachay, Olessia Koltsova, Andrey Kutuzov, Sergei O. Kuznetsov, Irina A. Lomazova, Natalia Loukachevitch, Amedeo Napoli, Alexander Panchenko, Panos M. Pardalos, Marcello Pelillo, Andrey V. Savchenko, Elena Tutubalina
PublisherSpringer Science and Business Media Deutschland GmbH
Pages356-368
Number of pages13
ISBN (Print)9783030726096
DOIs
Publication statusPublished - 2021
Event9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020 - Moscow, Russian Federation
Duration: 15 Oct 202016 Oct 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12602 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020
Country/TerritoryRussian Federation
CityMoscow
Period15/10/2016/10/20

Keywords

  • Adversarial attack
  • Discrete sequential data
  • Natural language processing

Fingerprint

Dive into the research topics of 'Gradient-Based Adversarial Attacks on Categorical Sequence Models via Traversing an Embedded World'. Together they form a unique fingerprint.

Cite this