1 / 18

Biological Databases

Biological Databases. Biology outside the lab. Why do we need Bioinfomatics?.

pooky
Download Presentation

Biological Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biological Databases Biology outside the lab

  2. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological information generated by the scientific community. This deluge of genomic information has, in turn, led to an absolute requirement for computerized databases to store, organize, and index the data and for specialized tools to view and analyze the data.

  3. Information flux from data to decision Biology, Chemistry and Pharmaceutical research generate an huge amount of data. Information analysis rate is smaller than data production. Human Genome progect: 22.1 bilion bases sequenced but … what we do really know about it?

  4. Bioinformatics • Building and managing of biological databases (nucleotides, proteins, structures, small molecules, pathways, literature, …) • Data mining and data analysis (Computational Biology) • protein modelling ab initio – Homology modelling – simulations (Molecular Modeling)

  5. Literature databases http://www.ncbi.nlm.nih.gov/

  6. Nucleotide databases

  7. Protein databases • Uniprot databases: - Swiss-prot: provide a high level of annotation, minimal level of redundancy and high level of integration with other databases - TrEMBL: a computer-annotated supplement of Swiss-Prot that contains all the translations of EMBL nucleotide sequence entries not yet integrated in Swiss-Prot. • NCBI protein database (meta-database containing sequences from Uniprot entries, PDB derived sequences and translation from predicted ORF in genebank)

  8. Structural Database Protein structures obtained by crystallography or NMR are stored in PDB.

  9. Microarray Databases • GEOminibus • SMD Stanford Microarray Database Gene expression databases provides rough data of microarray expression. Data originated by different experiments can be merged to obtain previously unidentified results.

  10. EST Databases • EST: Expressed Sequence Tags 5’ EST : These regions tend to be conserved across species and do not change much within a gene family 3’ EST: Because these ESTs are generated from the 3' end of a transcript, they are likely to fall within non-coding, or untranslated regions (UTRs), and therefore tend to exhibit less cross-species conservation than do coding sequences. Sequence Tagged Site (STS): help to locate a gene in the genome. 3’EST are a good source of STS Available DBs: Genebank – dbEST – Unigene

  11. Tools • ORF finder • Blast • Multiple alignment • Conserved Domain Identification • Secondary structure and Folding Prediction

  12. sequencing ORF identification Example 1 A recombinant plasmid containing clone shows an interesting phenotype Rough sequence • Phylogenetically similar sequences • Conserved Domain Blast In-frame sequence

  13. CDS

  14. Example 2

  15. Example 2

  16. Example 2

  17. Exampe 2

  18. Example 2 Tune the method • Increase window size in evaluating score • - increase local information integrating “environmental” data • - 2 residues window -> 2 frames • 3 residues window -> 3 frames • …. • b) Use degenerate matching methods (based on size, polarity, h-bond behavior, …)

More Related