Prediction of 3D Chromatin Structure Using Recurrent Neural Networks

Michal Rozenwald, Ekaterina Khrameeva, Grigory Sapunov, Mikhail Gelfand

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    1 Citation (Scopus)

    Abstract

    The Hi-C technology provides an opportunity to obtain data on chromatin interactions. This technique has unraveled many principles of chromosomal folding, including subdivision of the genome into Topologically Associating Domains (TADs). Moreover, the correlation between the structure of chromatin and various factors such as transcriptional repressor CTCF binding sites, replication timing and many epigenetic features has been discovered [1-3].Our study focuses on application of Machine Learning methods to explore the 3D structure of chromatin. We predicted TADs annotation based on a comprehensive set of predictors that includes chromatin marks and histone modifications. The data from the following ChIP-seq experiments have been selected:Chriz, CTCF, Su(Hw), BEAF-32, CP190, Smc3, GAF, H3K27me3, H3K27a, H3K36me1, H3K36me3, H3K4me1, H3K9ac, H3K9me1, H3K9me2, H3K9me3, H4K16acThe target value is a characteristic that corresponds to the Topologically Associated Domains annotation using the Armatus software [4]. The objects are DNA sequence fragments of 20000 bp of fruit fly Drosophila melanogaster.We consider linear regression models with three types of regularization (Lasso, Ridge, Elastic Net) and Neural Networks. The sequential relationship of the DNA bins in terms of the physical distance justifies the usage of Recurrent Neural Networks. We built RNN architectures with different numbers of LSTM units and the input size from 1 to 10 DNA bins. The predictive models were trained and evaluated using a weighted MSE score. The mean target value of the train dataset was used as a constant prediction to estimate the performance of the models. The best score of weighted MSE was demonstrated by bidirectional LSTM RNN with 64 units. The input size of this modes is six DNA bins which is also equal to the average size of TADs. The most accurate RNN strongly outperforms the contant prediction and all four linear models. A protein Chriz is known to be associated with formation of chromatin domains in Drosophila melanogaster [5]. The feature corresponding to Chriz was selected by the linear models with L1 normalization as the most informative one. A prioritization of the features importance was obtained.

    Original languageEnglish
    Title of host publicationProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
    EditorsHarald Schmidt, David Griol, Haiying Wang, Jan Baumbach, Huiru Zheng, Zoraida Callejas, Xiaohua Hu, Julie Dickerson, Le Zhang
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Number of pages1
    ISBN (Electronic)9781538654880
    DOIs
    Publication statusPublished - 21 Jan 2019
    Event2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018 - Madrid, Spain
    Duration: 3 Dec 20186 Dec 2018

    Publication series

    NameProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018

    Conference

    Conference2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
    Country/TerritorySpain
    CityMadrid
    Period3/12/186/12/18

    Keywords

    • 3D chromatin structure
    • Machine Learning
    • Recurrent Neural Networks.
    • Topologically Associating Domains

    Fingerprint

    Dive into the research topics of 'Prediction of 3D Chromatin Structure Using Recurrent Neural Networks'. Together they form a unique fingerprint.

    Cite this