Simplitigs as an efficient and scalable representation of de Bruijn graphs

Karel Břinda, Michael Baym, Gregory Kucherov

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

de Bruijn graphs play an essential role in bioinformatics, yet they lack a universal scalable representation. Here, we introduce simplitigs as a compact, efficient, and scalable representation, and ProphAsm, a fast algorithm for their computation. For the example of assemblies of model organisms and two bacterial pan-genomes, we compare simplitigs to unitigs, the best existing representation, and demonstrate that simplitigs provide a substantial improvement in the cumulative sequence length and their number. When combined with the commonly used Burrows-Wheeler Transform index, simplitigs reduce memory, and index loading and query times, as demonstrated with large-scale examples of GenBank bacterial pan-genomes.

Original languageEnglish
Article number96
JournalGenome Biology
Volume22
Issue number1
DOIs
Publication statusPublished - Dec 2021

Keywords

  • Data compression
  • de Bruijn graph representation
  • de Bruijn graphs
  • Indexing
  • k-mers
  • Pan-genomes
  • Scalability
  • Sequence analysis
  • Simplitigs
  • Storage

Fingerprint

Dive into the research topics of 'Simplitigs as an efficient and scalable representation of de Bruijn graphs'. Together they form a unique fingerprint.

Cite this