1 / 70

DNA sequence analysis

DNA sequence analysis. Gene prediction methods Gene indices Mapping cDNA on genomic DNA Genome-genome comparison Applications. DNA sequences gene structure (eucaryotes). Protein coding sequence. 3‘UTR. 5‘UTR. promotor. exon 1. exon 2. exon n. exon n-1.

Pat_Xavi
Download Presentation

DNA sequence analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DNA sequence analysis • Gene prediction methods • Gene indices • Mapping cDNA on genomic DNA • Genome-genome comparison • Applications Computational Molecular Biology MPI for Molecular Genetics

  2. DNA sequencesgene structure (eucaryotes) Protein coding sequence 3‘UTR 5‘UTR promotor exon 1 exon 2 exon n exon n-1 Computational Molecular Biology MPI for Molecular Genetics

  3. DNA sequencesrepeats, repetitive elements • Long INterspersed Elements • SINE (e.g. Alu) • Transposons • Simple repeats (e.g. ATATA...) Computational Molecular Biology MPI for Molecular Genetics

  4. DNA sequencesrepeats, repetitive elements • High copy number • Sequence variability • Mostly located in untranslated regions Computational Molecular Biology MPI for Molecular Genetics

  5. Gene predictionStrategies for detecting ORFs / exons • Distribution of stop codons • Codon usage • Hexamer frequencies • Prediction of the coding frame • Splice site recognition (Eucaryotes only) Computational Molecular Biology MPI for Molecular Genetics

  6. Gene predictionby sequence comparison • Comparison of genomic DNA and cDNA/ESTs • Comparison of related genomic DNA of different organisms Computational Molecular Biology MPI for Molecular Genetics

  7. Gene predictionCodon usage (single exon) coding Frame 1 non-coding Frame 2 Frame 3 Computational Molecular Biology MPI for Molecular Genetics

  8. coding sequence Gene predictionCodon usage (single exon) coding Frame 1 non-coding Frame 2 Frame 3 correct start Computational Molecular Biology MPI for Molecular Genetics

  9. Gene predictionCodon usage (multiple exons) Exons: 208. .295 1029. .1349 1500. .1688 2686. .2934 3326. .3444 3573. .3680 4135. .4309 4708. .4846 4993. .5096 7301. .7389 7860. .8013 8124. .8405 8553. .8713 9089. .9225 13841. .14244 coding Frame 1 non-coding Frame 2 Frame 3 Splice sites Computational Molecular Biology MPI for Molecular Genetics

  10. Gene predictionCodon usage (multiple exons) Exons: 208. .295 1029. .1349 1500. .1688 2686. .2934 3326. .3444 3573. .3680 4135. .4309 4708. .4846 4993. .5096 7301. .7389 7860. .8013 8124. .8405 8553. .8713 9089. .9225 13841. .14244 coding Frame 1 non-coding Frame 2 Frame 3 Splice sites Computational Molecular Biology MPI for Molecular Genetics

  11. Gene predictionAdditional criteria • Detection of Start codons • Detection of potential promotor elements • Detection of repetitive sequences (mostly untranslated) • Homology to known genes of related organisms Computational Molecular Biology MPI for Molecular Genetics

  12. Gene predictionSoftware • GENSCAN (C.Burge & S.Karlin) • Grail (neural network; Ueberbacher et al.) • MZEF (M. Zhang,1997) • FGeneH, Hexon (V.Solovyev et al., 1994) • Genie, etc. All programs are using dynamic programming for detection of the optimal solution Computational Molecular Biology MPI for Molecular Genetics

  13. DNA sequences in public databases Human ~ 2.8 million ESTs + 130 000 RNAs Mouse ~ 1.8 million ESTs + 30 000 RNAs Computational Molecular Biology MPI for Molecular Genetics

  14. cDNA is usually oligo dT primed, or by random primers • Several cDNAs for the same mRNA may be generated AAAAAA... cDNA TTTTTT... Expressed sequence tags (EST) • Reverse transcriptase stops ‚randomly‘ mRNA Computational Molecular Biology MPI for Molecular Genetics

  15. Expressed sequence tags (EST) Dechiffered sequence (EST) Clone = mRNA fragment 3‘-primer <700 bp Vector (known sequence) Average: 1500 bp Computational Molecular Biology MPI for Molecular Genetics

  16. Expressed sequence tags (EST) • Isolation of mRNAs from tissue(s) • Generation of cDNAs reflecting parts of the RNAs • Cloning of cDNAs into a vector (often random orientation) • End sequencing of the clones Computational Molecular Biology MPI for Molecular Genetics

  17. Generation of ESTsBasecalling problems close to 5‘ end of EST close to 3‘ end of EST missing bases Computational Molecular Biology MPI for Molecular Genetics

  18. expressed sequence tags (ESTs) putative mRNA AAAAAA... 5‘UTR exon 1 exon 2 3‘UTR Coverage of an mRNA by ESTs Computational Molecular Biology MPI for Molecular Genetics

  19. Characteristics of ESTs • Highly redundant • Low sequence quality • (Cheap) • Reflect expressed genes • May be tissue/stage specific Computational Molecular Biology MPI for Molecular Genetics

  20. Gene indices Clustering of EST and mRNA sequences of an organism to reduce redundance in sequence data. Goal: Each cluster represents one gene or mRNA • UniGene (NCBI) • TIGR Gene Indices • STACK (SANBI) • GeneNest (DKFZ,MPI) Computational Molecular Biology MPI for Molecular Genetics

  21. Gene indicesGeneNest workflow EMBL database Unigene database Quality clipping Quality clipping BLAST/QUASAR search, clustering Assembly, Consensus sequences Visualization Computational Molecular Biology MPI for Molecular Genetics

  22. Gene indicesQuality clipping • Removal of vector sequence • Masking of repetitive sequences (e.g. Alu) • Removal of terminal sequences of low quality In order to cluster based on gene-specific sequence data the following steps have to be performed: Computational Molecular Biology MPI for Molecular Genetics

  23. Gene indicesClustering • Minimal % identity (e.g. > 95%) • Minimal length of match (e.g. >40 bp) • No internal matches (TIGR gene indices) • Same origin of tissue (only STACK) Sequences are usually clustered if the matching part between two sequences fullfills several (empirical) criteria: Computational Molecular Biology MPI for Molecular Genetics

  24. Gene indicesAssembly • Contigs, reflecting partially different sequences • One consensus sequence per contig • A relative order of the sequences (alignment) Sequences in a cluster are assembled to group those sequences which are globally similar, resulting in Computational Molecular Biology MPI for Molecular Genetics

  25. Gene indicesConsensus sequences • Reduced error rate • Consensus often longer than any single sequence contributing • Efficient database search • Detection of exon/intron boundaries and alternative splice variants Computational Molecular Biology MPI for Molecular Genetics

  26. Gene indices Alignment consensus Computational Molecular Biology MPI for Molecular Genetics

  27. Gene indices AlignmentSoftware • Phrap (Phil Green) • CAP3 (X. Huang) • TIGR assembler • GAP4 (R. Staden) Computational Molecular Biology MPI for Molecular Genetics

  28. GeneNest visualization(http://genenest.molgen.mpg.de) Computational Molecular Biology MPI for Molecular Genetics

  29. GeneNest visualization(http://genenest.molgen.mpg.de) Computational Molecular Biology MPI for Molecular Genetics

  30. TIGR Gene Indices(http://www.tigr.org/) Alignment scheme Computational Molecular Biology MPI for Molecular Genetics

  31. UniGene(http://www.ncbi.nih.nlm.gov/UniGene) Computational Molecular Biology MPI for Molecular Genetics

  32. UniGene(http://www.ncbi.nih.nlm.gov/UniGene) Computational Molecular Biology MPI for Molecular Genetics

  33. missing intron consensus sequence (  mRNA) exons Mapping of EST consensus sequences on genomic DNA genomic sequence Computational Molecular Biology MPI for Molecular Genetics

  34. Mapping cDNA on genomic DNA Computational Molecular Biology MPI for Molecular Genetics

  35. Mapping cDNA on genomic DNA(http://splicenest.molgen.mpg.de) Computational Molecular Biology MPI for Molecular Genetics

  36. Genome-genome comparison ancestral gene mouse x xxx x xxx human X = region with low mutation rate Computational Molecular Biology MPI for Molecular Genetics

  37. Genome-genome comparison Computational Molecular Biology MPI for Molecular Genetics

  38. Genome-genome comparison • Conserved coding regions (protein similarity, similar function) • Conserved coding exons (protein domain similarity, functional feature) • Conserved non-coding regions (regulatory sites, transcription factor binding sites) Computational Molecular Biology MPI for Molecular Genetics

  39. Gene indicesApplications • Detection of exon/intron boundaries • Detection of alternative splicing • Detection of Single Nucleotide Polymorphisms • Genome annotation • Analysis of gene expression • Design of DNA-chips/arrays Computational Molecular Biology MPI for Molecular Genetics

  40. hnRNA 5‘UTR exon 1 exon 2 exon 3 mRNA 1 5‘UTR exon 1 exon 3 mRNA 2 5‘UTR exon 1 exon 2 Alternative Splicing Computational Molecular Biology MPI for Molecular Genetics

  41. splice variant consensus sequence (  mRNA) exons Alternative Splicing genomic sequence Computational Molecular Biology MPI for Molecular Genetics

  42. Alternative Splicing(additional exon) Splice variants of adenylsuccinate lyase unspliced ? skipped exon gene prediction errors ? Computational Molecular Biology MPI for Molecular Genetics

  43. Alternative Splicing Splice variants of APECED gene alternative variants number of sequences genomic sequence Computational Molecular Biology MPI for Molecular Genetics

  44. Alternative splicing Computational Molecular Biology MPI for Molecular Genetics

  45. Alternative Splicing (alternative donor site) Computational Molecular Biology MPI for Molecular Genetics

  46. Alternative Splicing Computational Molecular Biology MPI for Molecular Genetics

  47. Alternative Splicing(alternative exons) Computational Molecular Biology MPI for Molecular Genetics

  48. Alternative Splicing(unknown gene Hs16936) Computational Molecular Biology MPI for Molecular Genetics

  49. Single Nucleotide Polymorphisms(SNP) • SNPs are single base differences within one species • Several million SNPs detected in Human • SNPs may be related to diseases Computational Molecular Biology MPI for Molecular Genetics

  50. Single Nucleotide Polymorphisms(SNP) SNP or basecalling error ? Computational Molecular Biology MPI for Molecular Genetics

More Related