170 likes | 223 Views
Genome Informatics 2005. ~ 220 participants 1 keynote speaker: David Haussler 47 talks 121 posters. Rodger Voelker:Two classes of splice junctions. Search for 5-7 base motifs in exonic and intronic flanking sequences of known splice junctions
E N D
Genome Informatics 2005 • ~ 220 participants • 1 keynote speaker: David Haussler • 47 talks • 121 posters
Rodger Voelker:Two classes of splice junctions • Search for 5-7 base motifs in exonic and intronic flanking sequences of known splice junctions • Computational analysis of collocations between different motifs • Many collocations between exonic and intronic sequences • Known ESEs display collocations with intronic sequences (including ISEs) • Nearly all introns (89%) can be classified into 2 classes
Chip Lawrence: futility of optima in inferences • The strong focus in bioinformatics on optimal solutions is fundamentally flawed, because the asymptotic underpinnings of these solutions, such as consistency, do not apply • The curse of dimensionality can render optimal solutions very unlikely and misleading • Example: minimum free energy predictions of RNA structures • Reason: incomplete energy function used, only sec structure considered, no tertiary
Minimum free energy predictions of RNA structures • Assumption: • molecule folds into lowest energy state • unique solution to folding problem (optimum) • Many programs (e.g. Zuker's Mfold) use the Boltzmann probability function • Most include calculations of suboptimal structures • but not all structures are computed • PPV of MFE: 48 %
Alternative prediction of RNA structures • Sample the ensemble of sec structures in proportion to their Boltzmann weights • Cluster the structures • Use centroid structure in predictions • Improved PPV compared to MFE • Srna module of Sfold (http://sfold.wadsworth.org/ )
Alternative prediction of RNA structures • Improved PPV compared to MFE: • Ensemble centroid + 30 % • Largest cluster centroid +18 % • Best centroid + 47 %
Data mining • Geneseer – searchable name-translation database (http://geneseer.cshl.org/ ) • Access to genomic information through gene names • Mapping sequences to gene names • Identification of homologs across several species for a given gene • Used in RNAi Codex (http://codex.cshl.edu )
Data mining • Ulysses – annotate human genes based on gene interactions in model organisms(http://www.cisreg.ca:8080/ulysses/ ) • Interologs: conserved protein-protein interactions • Regulogs: conserved protein-DNA interactions • Almost no overlap between data in interaction databases • BIND DIP: 984 refs; BIND 5 DB's: 3 refs
Data mining • Integrated Genome Browser (IGB) – visualize: • Genomic annotations from multiple data resources • Experimental data from Affymetrix arrays (http://www.affymetrix.com/support/developer/tools/download_igb.affx )
Gene expression and pathways • Skypainter tool in Reactome database: • allows overlay of gene expression data on pathway graphs • allows generation of a "movie" of a time series • (http://www.reactome.org/ )
Gene expression • ArrayBlast: • Compares gene expression signatures generated on different platforms • Uses public microarray data sets (GEO) • Used to create conserved cancer-related expression signature • (http://seq.mc.vanderbilt.edu/arrayBlast/ )
Gene expression • C. elegans Gene Expression Consortium: • SAGE data from specific stages, tissues and cell types • Database of gene expression data/pictures/movies of transgenic worms with promoter::GFP fusions for 2000 genes with human orthologs (http://elegans.bcgsc.ca/home/ge_consortium.html )
Michael Caudy: Whole genome analysis of combinatorial and architectural transcription codes • Search for TFBS in known neural pathway genes • Determine architecture: number, type, order, orientation and spacing of TFBS • Compare architecture of activated and repressed genes • Determine activity of promoters with TFBS mutations • Architecture is critical for differential response to Notch signalling
Regulatory sequence identification • Evoprinter: • highlights multi-species conserved sequences within orthologous DNAs in the context of a single species of interest • (http://evoprinter.ninds.nih.gov/ )
Regulatory sequence identification • NestedMICA: • method for discovering many over-represented short motifs in large sets of strings in a single run • candidate transcription factor binding sites • (http://www.sanger.ac.uk/Software/analysis/nmica/ )