Abstract
An important topic in genomic sequence analysis is the identification of
protein coding regions. In this context, several coding DNA model-independent
methods, based on the occurrence of specific patterns of nucleotides at coding
regions, have been proposed. Nonetheless, these methods have not been
completely suitable due to their dependence on an empirically pre-defined
window length required for a local analysis of a DNA region.
We introduce a method, based on a modified Gabor-wavelet transform (MGWT), for
the identification of protein coding regions. This novel transform is tuned to
analyze periodic signal components and presents the advantage of being
independent of the window length. We compared the performance of the MGWT with
other methods using eukaryote datasets. The results show that the MGWT
outperforms all assessed model-independent methods with respect to
identification accuracy. These results indicate that the source of at least
part of the identification errors produced by the previous methods is the fixed
working scale. The new method not only avoids this source of errors, but also
makes available a tool for detailed exploration of the nucleotide occurrence.
@ARTICLE{MCZC08, author = {J. P. Mena-Chalco and H. Carrer and Y. Zana and R. M. Cesar-Jr.}, title = {Identification of protein coding regions using the modified {G}abor-wavelet transform}, journal = {IEEE/ACM Transactions on Computational Biology and Bioinformatics}, volume = {5}, pages = {198-207}, year = {2008}, http = {http://doi.ieeecomputersociety.org/10.1109/TCBB.2007.70259}, }