1 / 20

Sequence Analysis

Sequence Analysis. MUPGRET June workshops. Today. What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel. What can you do with the sequence?. Gene prediction Motif identification Promoter identification Survey gene expression across tissues

kynton
Download Presentation

Sequence Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Analysis MUPGRET June workshops

  2. Today • What can you do with the sequence? • What can you do with the ESTs? • The case of SNP and Indel

  3. What can you do with the sequence? • Gene prediction • Motif identification • Promoter identification • Survey gene expression across tissues • Full length gene isolation

  4. NCBI Tools • National Center for Biotechnology • National Library of Medicine, NIH • Created in 1988 to develop information systems for molecular biology. • Provides data retrieval systems and computational resources.

  5. Database Resources • Database retrieval tools • BLAST family of sequence-similarity search programs. • Resources for gene-level sequences • Resources for genome-scale analysis

  6. Database Resources • Resources for analyzing gene expression patterns and phenotypes • Molecular modeling database, conserved domain database, conserved domain architecture retrieval tool.

  7. Database Retrieval Tools • Entrez-for DNA and protein sequences • PubMed Central-for literature • Taxonomy-organisms and associated sequences • LocusLinks-provides links from sequence info to map and other information.

  8. BLAST family • Basic local alignment search tool • Sequence similarity search against various databases in GenBank • Gapped alignments with links to various other databases such as unigene or locuslink.

  9. BLAST • pairwise alignment but can do multiple alignments with “query-anchored” feature. • each alignment has a statistical significance (e-value) • Accounts for amino acid sequence • Outputs a list of matches including start, stop, score, and e-value.

  10. 5 BLAST Programs • BLASTN – Nucleotide vs. Nucleotide • BLASTP – Protein vs. Protein • BLASTX – Protein vs. nucleotide translation • TBLASTN – Nucleotide translation vs. Protein • TBLASTX – Nucleotide translation vs. nucleotide translation.

  11. BLAST family • BLAST2Sequences-dot plot of alignment • MegaBLAST-nearly exact matches • PSI-BLAST – match to protein that reduces false positive hits • Blink – Allows display of alignments by taxonomic criteria, database origin, relation to a complete genome, relation to a 3D protein structure or conserved domain.

  12. Gene-Level Sequences • UniGene – Identifies a non-redundant set of EST based on GenBank sequences. • ProtEST – displays pre-computed BLAST alignments between protein sequences from model organisms and the 6-frame translation of the UniGene nucleotide sequences.

  13. Gene-Level Sequences • HomoloGene – Curated and calculated gene lrthologs and homologs for 14 organsisms. • RefSeq – Curated reference sequences for mRNAs, genomic sequences, etc. • ORF Finder – 6-frame translation with graph of ORF position. • ePCR – locates sequence tagged sites. • dbSNP – Contains SNP and InDel

  14. Genome-Scale Analysis • Entrez Genomes – taxonomic, genome or chromosome view of the current sequence data for an organism. • COGs – List of orthologous protein groups from completely sequenced organisms. • Retroviroal genotyping tools – Important in viral genetic diversity, tracking outbreaks, and vaccine development.

  15. Genome-Scale Analysis • Eukaryotic Genomic Resources – location of Plant Genomes Central with information from various plant genome projects. • Map Viewer – Displays genome assemblies using chromosome map views. • Model Maker (MM) – Generates transcript models using exon data from prediction or from GenBank alignments.

  16. Genome-Scale Analysis • Evidence Viewer – Graphical summary of alignments relative to contigs including insertion/deletion or mismatches. • Human-Mouse Homology Maps – List of genes in homologous segments. • Cancer Chromosome Aberration Project – List of recurrent chromosome aberrations associated with cancer.

  17. Gene Expression/Phenotype • SAGEmap – A way to look at SAGE data inlcuding two-way mapping between SAGE tag and UniGene. • Gene Expression Omnibus (GEO) – Data repository and retrieval system for expression data from all sources. • OMIM – Catalog of human genes and genetic disorders including phenotypes and polymorphism information.

  18. MMDB, CDDB, CDART • Molecular Modeling Database • Based on Protein Data Bank • Conserved Domain Database • PSI-BLAST-derived scores indicating domains in the protein data bank. • Conserved Domain Architecture Retrieval Tool – Identifies conserved domains and displays their structure.

  19. Sequence Analysis References • Korf, Yandell, and Bedell. 2003. An Essential Guide to the Basic Local Alignment Search Tool: BLAST. O’Reilly & Associates, Sebastopol, CA. • Markel and Leon. 2003. Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases. O’Reilly & Associates, Sebastopol, CA.

  20. Sequence Analysis References • Baxevanis and Ouellette. 2001. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Wiley Interscience, New York. • Mount. 2000. Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory, New York.

More Related