Reconsidering the significance of genomic word frequencies

Miklós Csurös, Laurent Noé, Gregory Kucherov

Research output: Contribution to journalShort surveypeer-review

23 Citations (Scopus)


By conventional wisdom, a feature that occurs too often or too rarely in a genome can indicate a functional element. To infer functionality from frequency, it is crucial to precisely characterize occurrences in randomly evolving DNA. We find that the frequency of oligonucleotides in a genomic sequence follows primarily a Pareto-lognormal distribution, which encapsulates lognormal and power-law features found across all known genomes. Such a distribution could be the result of completely random evolution by a copying process. Our characterization of the entire frequency distribution of genomic words opens a way to a more accurate reasoning about their over- and underrepresentation in genomic sequences.

Original languageEnglish
Pages (from-to)543-546
Number of pages4
JournalTrends in Genetics
Issue number11
Publication statusPublished - Nov 2007
Externally publishedYes


Dive into the research topics of 'Reconsidering the significance of genomic word frequencies'. Together they form a unique fingerprint.

Cite this