TY - GEN

T1 - Duplication with transposition distance to the root for q-ary strings

AU - Polyanskii, Nikita

AU - Vorobyev, Ilya

N1 - Funding Information:
I. Vorobyev was supported in part by the Russian Foundation for Basic Research (RFBR) through grant no. 20-01-00559 and by RFBR and the Japan Society for the Promotion of Science under grant no. 20-51-50007. N. Polyanskii was supported in part by the European Research Council under the EU’s Horizon 2020 research and innovation programme (grant No. 801434).
Publisher Copyright:
© 2020 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.

PY - 2020/6

Y1 - 2020/6

N2 - We study the duplication with transposition distance between strings of length n over a q-ary alphabet and their roots. In other words, we investigate the number of duplication operations of the form x = (abcd) →y = (abcbd), where x and y are strings and a, b, c and d are their substrings, needed to get a q-ary string of length n starting from the set of strings without duplications. For exact duplication, we prove that the maximal distance between a string of length at most n and its root has the asymptotic order n/logn. For approximate duplication, where a β-fraction of symbols may be duplicated incorrectly, we show that the maximal distance has a sharp transition from the order n/logn to logn at β = (q - 1)/q. The motivation for this problem comes from genomics, where such duplications represent a special kind of mutation and the distance between a given biological sequence and its root is the smallest number of transposition mutations required to generate the sequence.

AB - We study the duplication with transposition distance between strings of length n over a q-ary alphabet and their roots. In other words, we investigate the number of duplication operations of the form x = (abcd) →y = (abcbd), where x and y are strings and a, b, c and d are their substrings, needed to get a q-ary string of length n starting from the set of strings without duplications. For exact duplication, we prove that the maximal distance between a string of length at most n and its root has the asymptotic order n/logn. For approximate duplication, where a β-fraction of symbols may be duplicated incorrectly, we show that the maximal distance has a sharp transition from the order n/logn to logn at β = (q - 1)/q. The motivation for this problem comes from genomics, where such duplications represent a special kind of mutation and the distance between a given biological sequence and its root is the smallest number of transposition mutations required to generate the sequence.

KW - combinatorics on words

KW - de Bruijn sequences

KW - DNA codes

KW - Duplication with transposition

UR - http://www.scopus.com/inward/record.url?scp=85090414755&partnerID=8YFLogxK

U2 - 10.1109/ISIT44484.2020.9174324

DO - 10.1109/ISIT44484.2020.9174324

M3 - Conference contribution

AN - SCOPUS:85090414755

T3 - IEEE International Symposium on Information Theory - Proceedings

SP - 2903

EP - 2908

BT - 2020 IEEE International Symposium on Information Theory, ISIT 2020 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2020 IEEE International Symposium on Information Theory, ISIT 2020

Y2 - 21 July 2020 through 26 July 2020

ER -