Prediction of 3D Chromatin Structure Using Recurrent Neural Networks

Michal Rozenwald, Ekaterina Khrameeva, Grigory Sapunov, Mikhail Gelfand

    Результат исследований: Глава в книге, отчете, сборнике статейМатериалы для конференциирецензирование

    1 Цитирования (Scopus)


    The Hi-C technology provides an opportunity to obtain data on chromatin interactions. This technique has unraveled many principles of chromosomal folding, including subdivision of the genome into Topologically Associating Domains (TADs). Moreover, the correlation between the structure of chromatin and various factors such as transcriptional repressor CTCF binding sites, replication timing and many epigenetic features has been discovered [1-3].Our study focuses on application of Machine Learning methods to explore the 3D structure of chromatin. We predicted TADs annotation based on a comprehensive set of predictors that includes chromatin marks and histone modifications. The data from the following ChIP-seq experiments have been selected:Chriz, CTCF, Su(Hw), BEAF-32, CP190, Smc3, GAF, H3K27me3, H3K27a, H3K36me1, H3K36me3, H3K4me1, H3K9ac, H3K9me1, H3K9me2, H3K9me3, H4K16acThe target value is a characteristic that corresponds to the Topologically Associated Domains annotation using the Armatus software [4]. The objects are DNA sequence fragments of 20000 bp of fruit fly Drosophila melanogaster.We consider linear regression models with three types of regularization (Lasso, Ridge, Elastic Net) and Neural Networks. The sequential relationship of the DNA bins in terms of the physical distance justifies the usage of Recurrent Neural Networks. We built RNN architectures with different numbers of LSTM units and the input size from 1 to 10 DNA bins. The predictive models were trained and evaluated using a weighted MSE score. The mean target value of the train dataset was used as a constant prediction to estimate the performance of the models. The best score of weighted MSE was demonstrated by bidirectional LSTM RNN with 64 units. The input size of this modes is six DNA bins which is also equal to the average size of TADs. The most accurate RNN strongly outperforms the contant prediction and all four linear models. A protein Chriz is known to be associated with formation of chromatin domains in Drosophila melanogaster [5]. The feature corresponding to Chriz was selected by the linear models with L1 normalization as the most informative one. A prioritization of the features importance was obtained.

    Язык оригиналаАнглийский
    Название основной публикацииProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
    РедакторыHarald Schmidt, David Griol, Haiying Wang, Jan Baumbach, Huiru Zheng, Zoraida Callejas, Xiaohua Hu, Julie Dickerson, Le Zhang
    ИздательInstitute of Electrical and Electronics Engineers Inc.
    Число страниц1
    ISBN (электронное издание)9781538654880
    СостояниеОпубликовано - 21 янв. 2019
    Событие2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018 - Madrid, Испания
    Продолжительность: 3 дек. 20186 дек. 2018

    Серия публикаций

    НазваниеProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018


    Конференция2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018


    Подробные сведения о темах исследования «Prediction of 3D Chromatin Structure Using Recurrent Neural Networks». Вместе они формируют уникальный семантический отпечаток (fingerprint).