1 / 51

Bioinformatics for Genomic and Proteomic data analysis

Bioinformatics for Genomic and Proteomic data analysis. -- Gene Prediction. Sequence Analysis. -- Alignment techniques (BLAST, PSI-BLAST). -- Major databases and retrieval techniques. -- Predicting Function, domains etc.

Download Presentation

Bioinformatics for Genomic and Proteomic data analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics for Genomic and Proteomic data analysis -- Gene Prediction • Sequence Analysis -- Alignment techniques (BLAST, PSI-BLAST) -- Major databases and retrieval techniques. -- Predicting Function, domains etc. -- finding homology between sequences, identifying repeats etc (DOTPLOT). -- Predicting phyico-chemical properties of protein (ProtParam). -- Predicting signal peptides and transmembrane proteins (SignalP). -- Phylogenetic analysis • Structure analysis -- Analysis of Protein structure and conformation (Rasmol, SwissPDBViewer, VMD etc). -- Protein structure predictions- Homology modeling (SwissModel, Modeller). • Some practical applications -- Sequence analysis -- Structure analysis

  2. Major Bioinformatics databases, Search engines and data formats. By: Sachin Pundhir Bioinformatics sub-centre DAVV, Indore

  3. Database • Collection of records and files • Organized for a particular purpose • Tables • Tuples (records) • Attributes • Values

  4. 1998 Name ID Grade Amy 123 A Joe 456 B Sue 789 C Table Tuple Attribute Value BIO520 Student Database . .

  5. Database Operations • Tables • Create, delete • Tuples (Records) • Read,write, delete • Search, sort, modify, print… 1998 Name ID Grade Amy 123 A Joe 456 B Sue 789 C

  6. International Nucleotide Sequence Database Collaboration (INSDC) • Consists of DDBJ (Japan) GenBank (USA) EMBL Nucleotide Sequence Database. • The three databases exchange new and updated data on a daily basis to achieve optimal synchronisation.

  7. Secondary database Bioinformatics databases • Protein sequence database: • Genpept: Protein sequence database. • UniProtKB/Swiss-Prot: curated protein sequence database, minimal level of redundancy and high level of integration with other databases. • UniProtKB/TrEMBL: computer-annotated supplement of Swiss-Prot that contains all the translations of EMBL nucleotide sequence entries not yet integrated in Swiss-Prot. • Refseq: Well curated, non-redundant database. • Structure Database • PDB: Protein Data Bank • MMDB: Molecular Modeling Database • Nucleotide sequence database: • Genbank: Nucleotide sequence database. Highly redundant. • DDBJ: DNA Data Bank of Japan. • EMBL: nucleotide sequence database. • Refseq: integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein products, for major research organisms. Primary databases

  8. GenBank Record • Header • information that apply to the whole record • Features • annotations on the record • Sequence

  9. GenBank Record GeneBank Record Header modification date Molecule Type Locus Name Sequence Length Accession Number Modification Date Version Number GenBank Division

  10. FEATURE GeneBank Record Link to Seq

  11. Sequence GenBank Record

  12. Using Entrez An integrated database search and retrieval system

  13. WWWAccess Entrez & BLAST

  14. Entrez: Database Integration Word weight PubMed abstracts 3-D Structure 3 -D Structure Taxonomy VAST Genomes Phylogeny Protein sequences Nucleotide sequences BLAST BLAST

  15. Database Searching with Entrez Using limits and field restriction to find human MutL homolog Linking and neighboring with MutL

  16. Global Entrez Search

  17. Document Summaries:MutL[All Fields]

  18. Entrez Nucleotides: Limits & Preview/Index Tabs

  19. Accession All Fields Author Name EC/RN Number Feature key Filter Gene Name Issue Journal Name Keyword Modification Date Organism Page Number Primary Accession Properties Protein Name Publication Date SeqID String Sequence Length Substance Name Text Word Title Uid Volume Field Restriction MutL Entrez Nucleotides: Limits Exclude bulk sequences

  20. MutL Entrez Nucleotides: Limits Title == Definition Exclude Bulk Sequences

  21. Document Summaries: Limits

  22. Adding Terms: Preview/Index Accession All Fields Author Name EC/RN Number Feature key Filter Gene Name Issue Journal Name Keyword Modification Date Organism Page Number Primary Accession Properties Protein Name Publication Date SeqID String Sequence Length Substance Name Text Word Title Uid Volume

  23. Human MutL Search Results

  24. GenBank Records Human MutL RefSeq

  25. NM_000249: Links

  26. Literature Links PubMed OMIM

  27. NM_000249: PubMed Books

  28. Books Link

  29. Conserved Domain OMIM: Human Disease Genes

  30. Sequence Links Nucleotide Protein

  31. Genome Project BAC Original GenBank mRNAs Original GenBank genomic NM_000249: Related Sequences similarity

  32. Taxonomy Link The Tax Browser NCBI’s Taxonomy

  33. Taxonomy Link

  34. NCBI Protein Databases • GenPept GenBank, EMBL, DDBJ CDS translations • RefSeq mRNA based (NP_) and genome based (XP_) • Swiss-Prot curated high quality protein reviews • PIR protein information resource Georgetown University • PRF protein resource foundation • PDB Protein Databank sequences from structures

  35. Protein Link BLAST Link Conserved Domains

  36. Related Proteins: Redundancy Redundant Sequences

  37. Sequence from MutL structure Related Proteins: Links

  38. BLink: non-redundant relatives Arabidopsis homolog Conserved Domain

  39. Mismatch Repair Domain ATPase Domain MLH1 Domain Structure: CDD

  40. MLH1: ATPase Domain

  41. ATPase structural alignment ATP Binding site helix

  42. Genome Resources

  43. NM_000249: Genome Links

  44. Higher Genome Resources

  45. MLH1: UniGene Cluster

  46. ESTs in UniGene

  47. orthologs orthologs paralogs frog A chick A mouse A mouseB chick B frog B A-chain gene B-chain gene gene duplication early globin gene The New Homologene • No longer UniGene based • Protein similarities first • Guided by taxonomic tree • Includes orthologs and paralogs

  48. The New Homologene

  49. Entrez Genes: integrated gene-based access • LocusLink • Complete Genomes • eukaryotic • microbial • organelle

  50. Genes MLH1: Central Resource

More Related