TY - JOUR

T1 - DNA codes for nonadditive stem similarity

AU - D’yachkov, A. G.

AU - Kuzina, A. N.

AU - Polyansky, N. A.

AU - Macula, A.

AU - Rykov, V. V.

PY - 2014/10/16

Y1 - 2014/10/16

N2 - DNA sequences are sequences with elements from the quaternary DNA alphabet {A, C, G, T}. An important property of them is their directedness and ability to form duplexes as a result of hybridization process, i.e., coalescing two oppositely directed sequences. In biological experiments exploiting this property it is necessary to generate an ensemble of such sequences (DNA codes) consisting of pairs of DNA sequences referred to as Watson-Crick duplexes. Furthermore, for any two words of the DNA code that do not form a Watson-Crick duplex, hybridization energy—stability measure of a potential DNA duplex—is upper bounded by a constant specified by conditions of an experiment. This problem can naturally be interpreted in terms of coding theory. Continuing our previous works, we consider a nonadditive similarity function for two DNA sequences, which most adequately models their hybridization energy. For the maximum cardinality of DNA codes based on this similarity, we establish a Singleton upper bound and present an example of an optimal construction. Using ensembles of DNA codes with special constraints on codewords, which we call Fibonacci ensembles, we obtain a random-coding lower bound on the maximum cardinality of DNA codes under this similarity function.

AB - DNA sequences are sequences with elements from the quaternary DNA alphabet {A, C, G, T}. An important property of them is their directedness and ability to form duplexes as a result of hybridization process, i.e., coalescing two oppositely directed sequences. In biological experiments exploiting this property it is necessary to generate an ensemble of such sequences (DNA codes) consisting of pairs of DNA sequences referred to as Watson-Crick duplexes. Furthermore, for any two words of the DNA code that do not form a Watson-Crick duplex, hybridization energy—stability measure of a potential DNA duplex—is upper bounded by a constant specified by conditions of an experiment. This problem can naturally be interpreted in terms of coding theory. Continuing our previous works, we consider a nonadditive similarity function for two DNA sequences, which most adequately models their hybridization energy. For the maximum cardinality of DNA codes based on this similarity, we establish a Singleton upper bound and present an example of an optimal construction. Using ensembles of DNA codes with special constraints on codewords, which we call Fibonacci ensembles, we obtain a random-coding lower bound on the maximum cardinality of DNA codes under this similarity function.

UR - http://www.scopus.com/inward/record.url?scp=84910006769&partnerID=8YFLogxK

U2 - 10.1134/S0032946014030041

DO - 10.1134/S0032946014030041

M3 - Article

AN - SCOPUS:84910006769

VL - 50

SP - 247

EP - 269

JO - Problems of information transmission

JF - Problems of information transmission

SN - 0032-9460

IS - 3

ER -