Dataset Reduction via Bias-Variance Minimization

Georgii Novikov, Maxim Panov, Ivan Oseledets

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The amount of generated, collected and labelled data rapidly increases nowadays, which raises the question of creating methods for extracting dataset subsets, learning on which a model can achieve the same generalization level as when learning on the whole available set of data. Methods of the optimal design family solve this problem, but they are poorly applicable to modern multidimensional data and complex models. In this paper, we construct a method based on an intuitive way to measure the quality of the dataset subset, propose ways to approximate this functional and develop an efficient optimization method to reduce the dataset size. We demonstrate the capabilities of our approach on the MNIST dataset in application to the logistic regression.

Original languageEnglish
Title of host publicationConference Proceedings - 5th Scientific School Dynamics of Complex Networks and their Applications, DCNA 2021
EditorsAlexander Hramov, Semen Kurkin, Andrey Andreev, Natalia Shusharina
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages143-146
Number of pages4
ISBN (Electronic)9781665442824
DOIs
Publication statusPublished - 2021
Event5th Scientific School on Dynamics of Complex Networks and their Applications, DCNA 2021 - Kaliningrad, Russian Federation
Duration: 13 Sep 202115 Sep 2021

Publication series

NameConference Proceedings - 5th Scientific School Dynamics of Complex Networks and their Applications, DCNA 2021

Conference

Conference5th Scientific School on Dynamics of Complex Networks and their Applications, DCNA 2021
Country/TerritoryRussian Federation
CityKaliningrad
Period13/09/2115/09/21

Keywords

  • Data distillation
  • Optimal experiment design
  • Statistical learning

Fingerprint

Dive into the research topics of 'Dataset Reduction via Bias-Variance Minimization'. Together they form a unique fingerprint.

Cite this