Creativision Posters and Technical Reports

[1] J. J. G. Leandro, R. M. Cesar Jr, and L. da F. Costa. Technical Report - Automatic Contour Extraction from 2D Neuron Images. Technical report, Instituto de Matemática e Estatística da Universidade de São Paulo and Instituto de Física de São Carlos - USP - Brazil, April 2008.
bib | pdf | http]
[2] F. M. Lopes, D. C. Martins-Jr, and R. M. Cesar-Jr. DimReduction - Interactive Graphic Environment for Dimensionality Reduction. Technical report, Instituto de Matemática e Estatística da Universidade de São Paulo and Universidade Tecnológica Federal do Paraná, 2008.
bib | pdf | http]
[3] J. Barrera, R. M. Cesar-Jr, D. C. Martins-Jr, R. Z. N. Vêncio, C. F. Becerra, C. A. B. Pereira, and H. A. del Portillo. Probabilistic Genetic Networks analysis of three Plasmodium falciparum strains from Dynamical Expression Signals. Poster Session, August 2006. 14th Annual International Conference On Intelligent Systems For Molecular Biology (ISMB).
bib | pdf]
The advent of genomics into malarial research is significantly accelerating the discovery of control strategies. Dynamical global gene expression measures of the intraerythrocytic developmental cycle (IDC) of the parasite at 1h-scale resolution were recently reported [1]. Moreover, by using Discrete Fourier Transform based techniques, it was demonstrated that many genes are regulated in a single periodic manner which allowed to order genes according to the phase of expression. In this work we present a framework to construct genetic networks from dynamical expression signals [2]. The adopted model to represent these networks is the Probabilistic Genetic Network (PGN). This network is a Markov chain with some additional properties. This model mimics the properties of a gene as a non-linear stochastic gate and the systems are built by coupling of these gates. The PGN estimation is made through the mean conditional entropy minimization to discover subsets of genes which perform the best predictions of the target gene in the posterior time instant. Moreover, a tool that integrates mining of dynamical expression signals by PGN design techniques, different databases and biological knowledge, has been developed. The applicability of this tool for discovering gene networks of the malaria expression regulation system has been validated for simulated data and also for real microarray data using the glycolytic pathway as a “gold-standard”, as well as by creating an apicoplast as PGN network [2].Also, a negative control between these two modules was confirmed through construction of PGN networks using four genes from glycolysis and four from apicoplast organele as seed genes [2]. Together, this data demonstrates the value of the PGN model in generating biologically meaningful networks and which include genes not included by the Fourier approach [1]. Currently, we are applying the same technique for three malarial strains (3D7, Dd2, HB3) in order to analyze similarities and differences among them and to discover whether or not these three data sets may be joint, which would improve the PGN estimation. [1] M. Llinas, Z. Bozdech, E. D. Wong, A. T. Adai and J. L. DeRisi. Comparative whole genome transcriptome analysis of three Plasmodium falciparum strains. Nucleic Acids Research, 34(4):1166-1173 (2006). [2] J. Barrera, R. M. Cesar-Jr, D. C. Martins-Jr, R. Z. N. Vêncio, E. F. Merino, M. M. Yamamoto, F. G. Leonardi, C. A. B. Pereira and H. A. del Portillo. Constructing probabilistic genetic networks of Plasmodium falciparum from dynamical expression signals of the intraerythrocytic development cycle. In McConnel, Lin and Hurban, editors, Methods of Microarray Data Analysis V. Springer, 2005. (in press).

[4] D. C. Martins-Jr, R. M. Cesar-Jr, and J. Barrera. W-operator window design for color texture classification. Poster Session, August 2006. Proceedings of XIX Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI).
bib | pdf]
This work generalizes the technique described in [1] to image processing applications based on color. This method chooses a subset of variables W (i.e. pixels seen through a window) that maximizes the information observed in a set of training data by mean conditional entropy minimization. The task is formalized as a combinatorial optimization problem, where the search space is the powerset of the candidate variables and the measure to be minimized is the mean entropy of the estimated conditional probabilities. As a full exploration of the search space requires an enormous computational effort, an algorithm of the feature selection literature is applied. The introduced approach is well fundamented mathematically and experimental results with color texture recognition applications show that it is also adequate to treat problems with color images. Comparative performance assessment of this technique including an artificial neural network alternative (Multi-Layer Perceptron) approach is presented.

[5] J. Barrera, R. M. Cesar-Jr, D. C. Martins-Jr, R. Z. N. Vêncio, C. A. B. Pereira, and H. A. del Portillo. Abstract: Estimation of Probabilistic Genetic Networks of Plasmodium falciparum from Dynamical Expression Signals. Poster session, October 2005. First International Conference of the Brazilian Association for Bioinformatics and Computational Biology (X-Meeting).
bib | pdf | citations in scholar google | citations in citeseer | crossref search | search in google]
The advent of genomics into malarial research is significantly accelerating the discovery of control strategies. Dynamical global gene expression measures of the intraerythrocytic developmental cycle (IDC) of the parasite at 1h-scale resolution were recently reported. Moreover, by using Discrete Fourier Transform based techniques, it was demonstrated that many genes are regulated in a single periodic manner which allowed to order genes according to the phase of expression. In this work we present a framework to construct genetic networks from dynamical expression signals. The adopted model to represent these networks is the Probabilistic Genetic Network (PGN). This network is a Markov chain with some additional properties. This model mimics the properties of a gene as a non-linear stochastic gate and the systems are built by coupling of these gates. The PGN estimation is made through the mean conditional entropy minimization to discover subsets of genes which perform the best predictions of the target gene in the posterior time instant. Moreover, a tool that integrates mining of dynamical expression signals by PGN design techniques, different databases and biological knowledge, was developed. The applicability of this tool for discovering gene networks of the malaria expression regulation system has been validated for simulated data (http://www.vision.ime.usp.br/CAMDA2004/simulations) and also for real microarray data using the glycolytic pathway as a “gold-standard” (http://www.vision.ime.usp.br/CAMDA2004/glycolysis.html), as well as by creating an apicoplast as PGN network (http://www.vision.ime.usp.br/CAMDA2004/apicoplast.html). As our program creates PGN networks, a negative control was idealized to further validate the biological value of our findings. Thus, eight genes, four from glycolysis and four from the apicoplast organelle, were chosen randomly and used together as seed genes to create PGN networks based on single-gene and two-gene predictions. The results clearly demonstrated that the glycolysis and apicoplast PGN networks based on single-gene predictions were not interconnected (http://www.vision.ime.usp.br/CAMDA2004/ga_c.html). With the exception of two genes from the glycolytic PGN network that inter-connected with the apicoplast PGN network, remaining genes were not connected based on two-gene predictions (http://www.vision.ime.usp.br/CAMDA2004/ga2_c.html). Together, this data demonstrates the value of the PGN model in generating biologically meaningful networks and which include genes not included by the Fourier approach.

[6] J. P. Mena-Chalco and R. M. Cesar-Jr. Protein Coding Regions Identification through the Modified Morlet Transform. Poster session, 04-07 Oct. 2005. First International Conference of the Brazilian Association for Bioinformatics and Computational Biology (X-Meeting).
bib | pdf | citations in scholar google | citations in citeseer | crossref search | search in google]
An important topic in biological sequences analysis is the protein coding regions identification. New methods of DNA sequences processing and genes identification can be created through the search by content such sequences. The periodic pattern of DNA in protein coding regions, called three-base periodicity (TBP), has been considered proper of coding regions. This phenomenon was not observed for nonprotein coding regions. The digital signal processing techniques supply a strong basis for regions identification with TBP. In this work we introduce a new method for protein coding regions identification with TBP, based on a newly introduced transform, called Modified Morlet Transform (MMT), which does not need to be trained on sequence databases. Preliminary results show that MMT is better than short time Fourier transform (STFT) by presenting greater sensitivity to TBP and discriminatory capability between protein coding regions. - This is the author's version of the work. Not for redistribution.

Keywords: Bioinformatics, Protein coding regions indentifications, modified Morlet transform
[7] J. P. Mena-Chalco and R. M. Cesar-Jr. Identificação de Regiões Codificantes Através da Transformada Modificada de Morlet. Poster session, Oct. 2005. I Simpósio de Iniciação Científica e Pós-Graduação do IME-USP.
bib | citations in scholar google | citations in citeseer | crossref search | search in google]
Um tópico importante na área de análise de seqüências biológicas é a busca de genes, ou seja, a identificação de regiões codificantes de proteínas. A identificação de tais regiões permite a posterior procura de significado, descrição ou categorização biológica e construção do mapa do genoma do organismo analisado. Novos métodos de processamento de seqüências de DNA e de identificação de genes podem ser criados através da busca de conteúdo (search-by-content) nessas seqüências. As técnicas de processamento digital de sinais fornecem uma base para a identificação de periodicidade de três nucleotídeos existentes nas seqüências, isto é, as possíveis regiões codificantes nas seqüências de DNA. Nesta trabalho é introduzido um método novo para a identificação dessas regiões, baseado em uma transformada, denominada Transformada Modificada de Morlet, e são apresentados vários resultados experimentais obtidos a partir de seqüências de DNA sintéticas e reais. As principal contribuição do trabalho consiste na criação de um método automático de identificação de regiões codificantes, que apresenta desempenho superior ao método tradicional baseado na STFT. O novo método apresenta algumas vantagens importantes em comparação com métodos existentes. - This is the author's version of the work. Not for redistribution.

[8] Y. Zana and R. M. Cesar-Jr. Face identification and verification using polar frequency components. Poster session, 17-20 2004. Brazilian Symposium on Computer Graphics and Image Processing, 17 (SIBGRAPI).
bib | pdf | citations in scholar google | citations in citeseer | crossref search | search in google]
We present a novel face recognition method based on global polar frequency features. The algorithm uses Fourier-Bessel transformation for polar frequency components extraction, conversion to a dissimilarity space, and a Linear Discriminant classifier. Although the algorithm performance was below that of state-of the-art methods w.r.t. recognition rate tests, it was equivalent on verification tests. These results indicate that recognition rates can be improved by considering more sophisticated decision rules based on confidence levels. - This is the author's version of the work. Not for redistribution.

Keywords: perception, biometrics
[9] Y. Zana and R. M. Cesar-Jr. Proximity relations in a polar Frequency face representation. Poster session, 17-20 Oct. 2004. Brazilian Symposium on Computer Graphics and Image Processing, 17 (SIBGRAPI).
bib | pdf | citations in scholar google | citations in citeseer | crossref search | search in google]
We assessed the representation space created by a Fourier-Bessel transformation (FBT) of face images that varied in view angle, translation, rotation, and scale. Multidimensional scaling analysis, detection and recognition tests were explored to assess the conservation of real-world proximity relations. View angle variation was well represented by FBT raw coefficients, but not their moduli. Translation, rotation, and scale variations were represented in a way that partially conserved the proximity relations. These properties corroborate our representation approach. - This is the author's version of the work. Not for redistribution.

Keywords: perception, biometrics
[10] J. P. Mena-Chalco, H. S. Alves, H. Carrer, and R. M. Cesar-Jr. Bioinformatics Tools for Assembling and Analysis of Chloroplast Genomes. Poster session, Oct. 2004. International Conference on Bioinformatics and Computational Biology (ICOBICOBI).
bib | pdf | citations in scholar google | citations in citeseer | crossref search | search in google]
Chloroplasts are organelles found only in plant and algae cells. They are responsible for photosynthesis and for the synthesis of key molecules required for the basic architeture and functioning of plant cells. These organelles have their own genetic machinery and together with the nucleus and mitochondrial genomes are responsible for celular coordenation activity. At the moment 29 higher plant plastid genomes (plastomes) have been sequenced (http://ncbi.nlm.nih.gov/). The plastome sequences are conserved among species but the genes arrangements are different for divergent plant groups. The knowledge of the nucleotide sequence of chloroplast genomes is important for evolution studies and for biotechnology applications. The chloroplast organelle being used as a model in this study was isolated from Eucalyptus grandis, an important economical tree for the production of paper and cellulose and in Brazil is located the main germoplasm collection of Eucalyptus outside Australia. We have sequenced 3500 sequences from an Eucalyptus DNA library. These sequences represent so far, 50% of the total plastome sequence of Eucalyptus grandis. These sequences are stored through a special pipeline at the bioinformatics servers at URL http://malariadb.ime.usp.br:8026/pipeline/. Once this phase is accomplished, the next step is the search for similar sequences in other related organisms. Some tentative results towards this direction have been already obtained. In this study, we apply digital signal processing (DSP) techniques on the genomic data sequences in order to identify and compare DNA and protein sequences of Eucalyptus grandis to the other available higher plant plastomes. We have chosen different approaches to identify protein coding DNA regions and to compare protein sequences. In particular, traditional Fourier analysis and the wavelet transform will be evaluated. - This is the author's version of the work. Not for redistribution.

[11] D. C. Martins-Jr, R. M. Cesar-Jr., J. Barrera, and G. Goldman. Genetic network architecture identification by conditional entropy analysis. Poster session, Maio 2003. ICoBiCoBi - International Conference on Bioinformatics and Computational Biology.
bib | pdf | citations in scholar google | citations in citeseer | crossref search | search in google]
A metabolic pathway is a sequence of biochemical reactions mediated by enzymes, constructed by proteins, generated from RNA, produced by gene expression in a genetic network. Besides gene expression is also regulated by proteins. These phenomena pathway constitutes a feedback system and the genetic networks are called regulatory, since they define the pathways. Nowadays, expression levels of thousands of genes can currently be measured simultaneously, thus allowing the observation of different aspects of system dynamics. Eukaryotic cells respond to DNA damage by arresting the cell cycle and modulating gene expression to ensure efficient DNA repair. Saccharomyces cerevisiae MEC1, the human ATR kinase homolog, plays central roles in transducing the damage signal. The procedure below described is being applied to identify genes that belong to the MEC1 hierarquical regulation. We used available microarray data where the genome-wide expression patterns of wild type cells were compared to mutants defectivein Mec1 signaling, under normal growth conditions and in response to the methylating agent metheylmethane sulfonate (MMS) and ioinizing radiation. A DDS (Discrete Dynamical System) is a finite set of equations that describes the sequential evolution of a vector of discrete variables, called state, under the action of some discrete external forces, called inputs. The next state in time t+1 is computed by a function called transition function, which depends on the states in the previous instants of time, that is, t, t-1, t-2, ... The transition function is decomposed in a vector of functions, called component functions, that compute the transition of each state component. In this work, we model a genetic network by a DDS, with some random parameters. The state vector is composed by gene expressions, where each gene is a component of this vector. The system architecture is the graph of dependence between genes, i.e. the output and inputs of the component functions. The goal of this work is to present a methodology for finding the architecture of a genetic network by observing time samples of the system dynamics. Information theory studies random variable dependence by measures such as conditional entropy and mutual information. When the independence of two variables increases, the conditional entropy also increases, since there is less information to concentrate the mass of the conditional probability. In this work, we explore this fact by computing conditional entropy between the gene regulated and the candidate regulatory genes. The entropy of several candidate regulatory genes is computed and organized in a U-type curve. For a given regulated gene, the minimum point of the U-curve defines the regulatory genes.

[12] J. Barrera, R. M. Cesar-Jr., D. C. Martins-Jr, P. J. S. Silva, H. Brentani, E. Osório, and S. J. Souza. Abstract: Dimensionality Reduction for SAGE-based gene identification. Poster session, Maio 2003.
bib | pdf | citations in scholar google | citations in citeseer | crossref search | search in google]
This abstract describes an ongoing research on dimensionality reduction methods applied to SAGE data. The molecular pathways underlying brain cancers progression are poorly understood, making the development of novel diagnostic and therapeutic strategies difficult. Gene expression patterns are crucial for maintaining and altering phenotypes of cells. Recent technological advances have resulted in several widely used methods for large-scale study of gene expression, including comprehensive open systems, such as SAGE (Serial Analysis of Gene Expression). SAGE is a method to efficiently count large numbers of mRNA transcripts by sequencing short tags, usually 10 bp in length. SAGE Genie uses a new analytical method of reliably matching SAGE tags to known genes. SAGE can evaluate the expression patterns of tens of thousands of genes in a quantitative manner. Using SAGE Genie (http://cgap.nci.nih.gov/SAGE) we selected 22 brain libraries and the best tag for each full length represented in that library. SAGE profiles of 16 brain tumor libraries were compared with SAGE profiles of 6 normal brain libraries to identify differentially expressed genes. We constructed a matrix of known genes and their expression ratio in tumors/ normal SAGE libraries and tried to group genes with correlated expression profiles across tumor types. We used 4 different types of brain tumors and 4 different regions of normal brain. The data has been normalized through the application of the normal transform in order to allow the analysis of genes with low expression profiles. We have used the concept of strong genes sets introduced recently by Seungchang et. al. to select differentially expressed genes. A strong gene set is a small group of genes that can resist to large errors in the gene expression measurement. Finally, we are using feature selection algorithms, like sparse support vector machines, to reduce the processing time and make it manageable by regular desktop computers.


This file was generated by bibtex2html 1.95.