Spaced seeds improve k-mer-based metagenomic classification

Karel Břinda, MacIej Sykulski, Gregory Kucherov

Research output: Contribution to journalArticlepeer-review

44 Citations (Scopus)

Abstract

Motivation: Metagenomics is a powerful approach to study genetic content of environmental samples, which has been strongly promoted by next-generation sequencing technologies. To cope with massive data involved in modern metagenomic projects, recent tools rely on the analysis of k-mers shared between the read to be classified and sampled reference genomes. Results: Within this general framework, we show that spaced seeds provide a significant improvement of classification accuracy, as opposed to traditional contiguous k-mers. We support this thesis through a series of different computational experiments, including simulations of large-scale metagenomic projects. Availability and implementation, Supplementary information: Scripts and programs used in this study, as well as supplementary material, are available from http://github.com/gregorykucherov/spaced-seeds-for-metagenomics.

Original languageEnglish
Pages (from-to)3584-3592
Number of pages9
JournalBioinformatics
Volume31
Issue number22
DOIs
Publication statusPublished - 15 Nov 2015
Externally publishedYes

Fingerprint

Dive into the research topics of 'Spaced seeds improve k-mer-based metagenomic classification'. Together they form a unique fingerprint.

Cite this