Spliced alignment: A new approach to gene recognition

Mikhail S. Gelfand, Andrey A. Mironov, Pavel A. Pevzner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)


Gene structure prediction is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on statistics and artificial intelligence and, surprisingly enough, applications of theoretical computer science methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way towards a new combinatorial approach to gene recognition. This paper describes a spliced alignment algorithm and a software tool which explores all possible exon assemblies in polynomial time and finds the multi-exon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully recognizes genes even in the case of short exons or exons with unusual codon usage; we also report correct assemblies for genes with more than 10 exons. On a test sample of human genes with known mammalian relatives the average correlation between the predicted and the actual genes was 99%, which is a very high accuracy as compared with other existing methods. The algorithm correctly reconstructed 87% of genes and the rare discrepancies between the predicted and real exon-intron structures were caused by either (i) extremely short (less than 5 amino acids) initial or terminal exons, or (ii) alternative splicing, or (Hi) errors in database feature tables. Moreover, the algorithm predicts human genes reasonably well when the homologous protein is non-vertebrate or even prokaryotic. The surprizingly good performance of the method was confirmed by extensive simulations: in particular, with target proteins showing just 25% similarity, the correlation between the predicted and actual genes still was as high as 95%.

Original languageEnglish
Title of host publicationCombinatorial Pattern Matching - 7th Annual Symposium, CPM 1996, Proceedings
EditorsGene Myers, Dan Hirschberg
PublisherSpringer Verlag
Number of pages18
ISBN (Print)3540612580, 9783540612582
Publication statusPublished - 1996
Externally publishedYes
Event7th Annual Symposium on Combinatorial Pattern Matching, CPM 1996 - Laguna Beach, United States
Duration: 10 Jun 199612 Jun 1996

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference7th Annual Symposium on Combinatorial Pattern Matching, CPM 1996
Country/TerritoryUnited States
CityLaguna Beach


Dive into the research topics of 'Spliced alignment: A new approach to gene recognition'. Together they form a unique fingerprint.

Cite this