[III] Genes, Genomics, and Chromosomes - PowerPoint PPT Presentation

iii genes genomics and chromosomes n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
[III] Genes, Genomics, and Chromosomes PowerPoint Presentation
Download Presentation
[III] Genes, Genomics, and Chromosomes

play fullscreen
1 / 100
[III] Genes, Genomics, and Chromosomes
200 Views
Download Presentation
sivan
Download Presentation

[III] Genes, Genomics, and Chromosomes

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. [III] Genes, Genomics, and Chromosomes Eukaryotic gene structure, Rot analyses, chromosomal organization of genes and noncoding DNA Genomics: Genome-wide analysis of gene structure and expression Structural organization of eukaryotic chromosomes Morphology and functional elements of eukaryotic chromosome

  2. Molecular Definition of a Gene • Definitation of a “Gene”: The entire nucleic acid sequence that is necessary for the synthesis of a functional gene product (polypeptide or RNA) • A gene includes: • Nucleic acid sequence not only encoding the amino acid sequence of the protein (coding region) • It is also required for the synthesis of an RNA transcript • It also contains the transcription-control region (i.e., enhancer or silencer) • Sequences that specifies 3’ cleavage and polyadenylation [poly(A)] sites, and splice sites • Most genes are transcribed into mRNAs, but some are transcribed into RNA molecules such as tRNA, rRNA and shRNA

  3. Gene Expression in Prokaryotes and Eukaryotes Prokaryotes Eukaryotes • Gene expression in prokaryotes takes place in a single compartment, but gene expression in eukaryotes takes place in multiple compartments in multiple stages

  4. Eukaryotic Genes Produce Monocistronic mRNAs and Contain Lengthy Introns • While prokaryotes produce polycistronic mRNA, eukaryotes produce monocistronic mRNA • In the polycistronic mRNA, a ribosome binding site is present near the start site for each of the cistron, and translation can be initiated from each of these sites • In eukayrotic mRNA, the 5’CAP site directs the binding of ribosome to the mRNA and protein synthesis begins from the closest AUG codon. Furthermore, most of the mRNA also possess poly(A) tails • In eukaryotes, introns, which are larger than exons, need to be removed from the precursor mRNA (pre-mRNA) before it can direct protein synthesis. Some introns in human genes are as big as 17 kb. The median intron length is about 3 kb.

  5. Comparison of Structures of the cDNA and Its Genomic Gene The main differences between a cDNA and a genomic gene are: • cDNA does not have intron • cDNA does not have a regulatory/promoter sequence

  6. Distribution of Uninterrupted and Interrupted Genes in Various Eukaryotes While majority of the genes in yeast are uninterrupted, most of genes in flies are interrupted by one or two introns and most genes in mammals are interrupted by many introns

  7. Sizes of Genes in Various Organisms Yeast genes are short, but genes in flies and mammals have a dispersed bimodal distribution extending to very long sizes

  8. Sizes of Exons and Introns Exons Introns Exons coding for proteins usually are short, but introns usually range from very short to very long

  9. Simple Eukaryotic Transcription Unit • In eukaryotes some DNA encodes a single protein while the others encode more than one protein • It means that some genes have simple transcription unites while others have complex transcription units. This slide shows a simple transcription unit

  10. Complex Eukaryotic Transcription Unit • Three different ways to process the primary transcription product of a gene to give rise to different mRNAs : • Using different splice sites to produce different mRNA species • Using alternative poly(A) sites to produce mRNAs with different 3’ exons • Using alternative promoters to produce mRNA with different 5’exons and same 3’ exons • Differential splicing of an mRNA lead to production of isoforms of gene products

  11. Kinetics of DNA Hybridization Suggested Reading: • Integration of Cot analysis, DNA cloning and high-throughput sequencing facilitate genome characterization and gene discovery. Perterson et al. (2002) Genome Res 12:795-807. • Repeated sequences in DNA. Britten and Kohne (1968) Science 161: 529-540 • The rate of DNA annealing is proportional to the concentration of nucleic acid and time of hybridization • dC/dt = -kC2 by integrating the equation between Co (initial) and after time t, C/Co = 1/(1 + k.Cot) . If C/Co = ½, Cot1/2 = 1/k

  12. Kinetics of DNA Reassociation (Cot Analysis) • Britten and Kohne (1968) studied genomic DNA sequence via measuring the kinetics of DNA reassociation • Assigned Reading: Repeated sequence in DNA • Rate of DNA reassociation is dependent upon random collision of the complementary strands (i.e., concentration of DNA) and duration of time for collision to occur dC/dt = -kC2 where k = reassociation constant By integration C/Co = 1/ (1 + k.Cot) Indicating that parameter controlling the re-association reaction is the product of initial DNA concentration and time (Cot) C/Co = ½ = 1/ (1+ kCot1/2) so: Cot1/2 = 1/k • Cot1/2 is the concentration and time required for 50% re-association

  13. Reassociation Kinetics of Eukaryotic DNA

  14. Calculating the Complexity of a Genome Cot1/2 (DNA of any genome) Complexity of any genome = Cot1/2 of E. coli 4.2 x 106 bp

  15. Repetitive and Unique DNA Sequence in Eukaryotes • Non-repetitive DNA: • Only present once per genome • Found in prokaryotic and eukaryotic genome • Intermediate (Moderate) Repetitive DNA: • Repeat several times (10-1000X) per genome • Disperse throughout the genome in eukaryotes • Highly Repetitive DNA: • Short repetitive DNA (<100 bp) present up to 1 million times in the eukaryotic genome • Larger genomes are not generated by increasing the number of copies of the same sequences present in smaller genomes. It is due to the presence of more repetitive DNA • Suggested Reading II: • Initial sequencing and analysis of human genome. Nature 409: 861-927, 2001. • Finishing the eukaryotic sequence of human genome. Nature 431: 931-945, 2004.

  16. The Proportions of Different Sequence components in eukaryotic Genomes • The absolute content of non-repetitive DNA increases with genome size but reaches a plateau at ~2-3x 109 bp • mRNA is typically derived from non-repetitive DNA sequence • A significant part of the moderately repeat DNA sequence consists of transposones (able to move around the genome)

  17. Genomes of Many Organisms Contain Much Noncoding DNA • Much of the DNA in many eukaryotic cells do not encode RNA or have any apparent regulatory function • Yeast, fruit flies, chicken, human : 12, 180, 1300, 3300 Mb DNA • Many lower organisms than human have higher DNA contents than human • Data from DNA sequence analysis revealed that the genome of higher eukaryotes contain large amount of non-coding DNA • Gene rich region vs. gene desert region

  18. Genome Size and Gene Numbers in Various Organisms The number of genes in bacterial and archael genomes is proportional to the genome size

  19. Relationship of Gene Number and Genome Size While the number of genes in prokaryotes correlates well with the sizes of their genome, the number of genes in eukaryotes does not correct well with their genome sizes

  20. Protein-Coding Genes • Solitary genes: About 25-50 percent of the protein-coding genes are represented only once in the haploid genome • Chicken lysozyme gene contains 15 kb DNA coding sequence which constitutes a simple transcription unit with three exons and 2 introns • Duplicated genes: These genes are close but nonidentical sequences that often are located within 5-50 kb of one another called “gene family” • Each gene family could contain from a few to 30 or so members • Gene family: A set of duplicated genes that encode proteins with similar but not identical amino acid sequences. Examples are: cytoskeletal proteins, the myosin heavy chain, the a- and b-globins • Protein family: Encode closely related , homologous proteins. Examples: protein kinases, vertebrate immunoglobins and olfactory receptors. Protein families include from just a few to 30 or more members • The genes encoding b-globins are a good example of gene family that contains five functional genes: b, d, Ag, Gg, and E

  21. Total Number of Genes and Duplicated Genes • In bacteria, since most of the genes are unique, so the number of distinct families is close to the total gene number • In eukaryotes, many genes are duplicated, and as a result the number of different gene families is much less than the total number of genes

  22. Proportions of Unique and Duplicated Genes The proportion of unique genes drops sharply with genome size; bacteria have the highest proportion of unique genes, and yeast, flies, worm and Arabidopsis drop sharply

  23. Heavily Used Gene Products (rRNA and snRNA Genes) are Arranged in Tandem Repeat • In vertebrates and invertebrates, the genes encoding rRNAs and some other noncoding RNAs such as snRNA are arranged in tandemly repeated arrays • These tandemly repeated genes, appear one after the other, encode identical or almost identical proteins or functional RNAs • The tandemly repeated rRNA and snRNA genes are needed to meet the great cellular demand for their transcripts. Example: cells have 100 copies or more of 5S rRNA genes • Multiple copies of tRNA and histone genes are also present in clusters, but generally not in tandem repeat

  24. A Tandem rDNA Gene Cluster A tandem gene cluster of rRNA gene

  25. Electromicrograph of DNA being Transcribed into RNA • Green arrow indicates DNA and Red arrow indicates RNA • This micrograph was taken by O.L. Miller, Jr, and Barbara R. Beatty at Oak Ridge National Lab showing the transcription of tandem repeat of rRNA genes in Xenopus oocytes

  26. Non-Protein Coding Genes Encode functional RNAs • There are non-protein genes in the genome that encode functional RNAs. These RNAs are important in regulating the expression of genes • Assigned Reading: The functional genomics of noncoding RNA. Mattick et al. (2005), Science 309: 1527-1528.

  27. How Many Genes Are There in All Organisms? • This slide shows the comparison of fly genes to those of the worm and yeast • Orthologous genes (orthologs): Genes encod corresponding polypeptides in different organisms. Two gene products from different organism that their sequence share >80% of their lengths are considered as orthologs • In flies, ~20% of the genes have orthologs with worm and yeast. These are required genes • When fly genes are compared with those of worm, an additional 10% genes are considered as additional orthologs. This means that these 30% genes are required for flies and worms • The total number of proteins can be a good estimate of the total proteome size

  28. Proportion of Protein Encoding Genes in Human Genome • Human haploid genome contains 22 autosomes plus the X and Y chromosomes, and the chromosomes range from 45 to 279 Mb DNA • The total haploid genome size is 3286 Mb (~3.3 x 109 bp) • The chromatin comprises majority of genome, ~2.9 x 109 bp) • Although about 25% of the human genome are for protein coding genes, the actual exons are only 1% The Structure of Average Human Gene

  29. Different Classes of Repetitive DNA Sequences Human Genome • Five classes of repetitive DNA sequences in human genome: • Transposons, 45% of thegenome, multiple copies • Pseudogenes, ~3,000 in all • Simple sdequence of repetitive DNA, ~3% of total DNA • Segmental duplications, 10 to 300 Kbthat have been duplicated, ~5% • Tandem repeat from blocks of one typeof sequence

  30. Genomic DNA of Eukaryotic Organisms Classes of DNA #/genome % of Human Genome ~25,000 55 Protein coding genes Tandemly repeated genes ~20 <0.001 U2 snRNA 0.4 rRNA ~300 Repetitious DNA Single sequence DNA variable ~6 Interpersed repeat ~3.26 45 Processed peusogenes 1-~100 ~0.4 n.a. Unclassified spacer DNA 25 Interspersed repeats: DNA transposons, LTR retrotransposons, Non-LTR retrotranspons, LINEs and SINEs

  31. Satellite DNAs • When eukaryotic DNA is centrifuged on a CsCl gradient, two components are observed: • Main band: most of the genomic DNA • Satellite band: one or multiple miner bands; they could be heavier or lighter than the main band • The main band DNA has buoyant density of 1.701 g/cmwith a G-C content of 42%, and minor band DNA has the buoyant density of 1.690 g/cmwith a G-C content of 30%

  32. Satellite DNAs Lie in Heterochromatin • Highly repetitive DNA (simple sequence DNA): Satellite DNA is characterized by rapid rate of hybridization, consists of very short sequences repeated many times in tandem in large clusters. It is typically <10% • In addition, multi-cellular eukaryotes have complex satellites with longer repeat units mainly in heterochromatic region • In human, a satellite DNA that consists of 171 bp repeats. b-satellite DNA family has ±68 repeat units interspersed with a longer 3.3 repeats • The tandem repeat DNA often has a distinct physical property that can be used to isolate. This physical property is the buoyant density which is lower than the buoyant density of the non-repetitive DNA • Therefore, by equilibrium centrifugation on a CsCl gradient, the satellite DNA can be separated from the non-repetitive DNA • The buoyant density of a duplex DNA depends on the G-C content according to the following formula Buoyant density = 1.660 + 0.00098 (% G-C) g/cm-3

  33. Most Simple-Sequence DNAs are Concentrated in Specific Chromosal Locations • Repetitious DNA is present in the genome of eukaryotic cells • Simple-sequence DNA or called satellite DNA (6% of the human genome), size 14 to 500 bp • Microsatellite, 1-13 bp • Interspersed repetitive DNA dispersed throughout the genome (also called as transposable elements) • By fluorescence in situ hybridization (FISH), the simple-sequence DNAs are localized near the centromeres and telomeres of mouse chromosome • Centromeric heterchromatin---necessary for separation of chromosome to daughter cells

  34. Diseases Associated with Microsatellites • Microsatellite occasionally occur within transcription units • At least 14 different types neuromuscular disease associate with microsatellite repeats in transcription unit of the gene • Myotonic dystrophy and spinocerebellar ataxia are the examples. In myotonic dystrophy, the transcript of DMPK (dystrophia myotonica protein kinase) gene contain 1000 to 4000 repeats of the sequence of CUG in the 3’ end untranslated region that interfere with normal RNA processing and export of the mature RNA from nucleus to cytosol

  35. Probing Minisatellite DNA by Southern Blot Hybridization • DNA samples from three different individuals were digested with a restriction enzyme Hinf 1, separated on agarose gels, transferred to nylon membranes and probed with three different radio-labeled minisatellites • Different unique among individuals were observed with different individuals • DNA Fingerprinting depends on differences in length of simple-sequence DNA

  36. DNA Fingerprinting • Minisatellite DNA: 14 to 100 bp repeat in a region of 1 to 5 kb region which makes up of 20-50 repeat units. • A slight difference in the total length of the repeats can be detected by PCR analysis. This forms the basis of DNA fingerprinting • This technique can be used in population studies, paternal or maternal identity test and criminal identification

  37. Hybridization Kinetics of cDNAs to mRNAs • The population complexity of mRNA isolated from a cell can be estimated by studying the kinetics of hybridization of mRNAs to their cDNAs • The example given below is to compare the mRNA population differences of RNA isolated from estrogen treated trout liver to its untreated control: • Isolate total RNA samples from livers of estrogen treated fish and control (RNAind & RNAunind) • Prepare 32P-labeled cDNAind by reverse transcription • Set up hybridization between 32P-cDNAind and RNAunind at different Rot values (concentration of 32P-cDNAind x time) • Determine the amount of hybridization by treating the hybridization mixture with S1 nuclease

  38. Hybridization between mRNA and cDNA • This slide shows the hybridization profile of excess mRNA of chick oviduct with the cDNA of chick oviduct • 32P-labelled cDNA synthesized from mRNA of chick oviduct and hybridized to excess mRNA of chick oviduct • The result showed that there are three components of cDNA present at different frequencies hybridizing to chick oviduct mRNA: • About 50% of cDNA hybridizing at a Rot1/2 of 0.0015 • About 15% of cDNA hybridizing at a Rot1/2 of 0.04 • About 35% of cDNA hybridizing at a Rot1/2 of 30

  39. Rot Analysis of Excess mRNA and cDNA of Chick Oviduct Cells • Total mRNA was isolated chick oviduct cells • 32P-cDNA was prepared from the total mRNA by reverse transcription • Rot analysis was conducted between radio labeled cDNA and excess amount of total mRNA • The Rot analysis data showed that there are three components of sequences hybridizing to cDNA: • The first component has the characteristic of ovalbumin mRNA • The second component has the total complexity of 15 Kb (7-8 different mRNA of 2000 bases • The last component has the complexity of 26 Mb (~13,000 mRNA) cDNA of estrogen-treated oviduct RNA hybridize to un-treated oviduct RNA

  40. Number of Expressed Gene Measured by DNA Microarray Analysis • Although Rot analysis can be used to reveal the complexity of mRNA population in any cell type, the number of gene expressed in any cell type can be determined by DNA microarray. • In this assay, the mRNA isolated from the cell type of interest can be reversed transcribed to cDNA with tags • The labeled cDNA is used to hybridize to an DNA array that contains entire number of genes of an organism of interest • The genes that hybridized to the tagged cDNA can be visualized by scanning the array • This slide shows results of DNA microarray analysis to determine expression of 12 genes in 59 individual breast tumor tissues of breastfed and breast-unfed women • Genes highly expressed are shown “red”, lower expression in “blue”, equal expression in “grey”

  41. Genomics: Genome-wide analysis of gene structure and expression

  42. Database of Genomes • Using automated DNA sequencing techniques, methods for cloning DNA fragments on the order of 100 Kb in length, and computer algorithms to piece together the stored sequence data, scientists have determined vast amounts of DNA sequences including the entire genome of human, and many key experimental organisms e.g., the round-worm (C. elegans), fruit flies, mice, medaka and zebrafish etc. • Since the cost of sequencing Mb of DNA is becoming very cheap, the genomes of many organisms are rapidly been determined • There are two databases for human genome: • The gene bank at the National Institute of Health at Bethesda, MD • The EMBL sequence base at the European Molecular Biology Laboratory in Heidelberg, Germany

  43. Comparison of the Regions of Human NF1 protein with Ira Protein of S. cerivisiae • Ira, the GTPase activating protein (GAP) modulate the GTPase activity of the monomeric G protein called ras. Both GAP and ras function to control cell replication and differentiation in response to signals from outside of the cell

  44. Structural Motifs When a protein shows no significant similarity to other proteins with the BLAST (basic local alignment sequence tool) algorithm, it may nevertheless share a short sequence that is functionally important. Such short sequence recurring in many different proteins, referred to as structural motifs

  45. Comparison of Related Sequences from Different Species Can Give Clues to Evolutionary Relationship Among Proteins • Paralogous: sequences that diverged as the result of gene duplication • Orthologous: sequences that aroused because of speciation

  46. Genes Can be Identified within Genomic DNA Sequences • By scanning for “Open Reading Frame” (ORF) • ORF is defined as a stretch of DNA containing at least with 100 bp with a start codon and a stop codon of translation • ORF analysis has identified at least more than 90% of the genes in bacteria and yeast • Both very short genes and long genes are missed by this method • For eukaryotic genes, due to the presence of multiple exons and introns, scanning of the ORF is not a good method to identify genes. One needs to use computer programs to compare the genomic DNA sequences to cDNA sequences, splice site sequences and sequences of the expressed sequence tags (EST) • Another powerful method for identifying human genes is to compare the human genomic sequence with that of the mouse since human and mouse are sufficiently related to have most genes in common

  47. Structural organization of eukaryotic chromosomes

  48. Questions? • How are DNA molecules organized within eukaryotic cells? • Total length of cellular DNA is up to a hundred thousand times of cell’s length and the packing of DNA is crucial to cell architecture • During interphase, DNA exists as a nucleoprotein complex, called as chromatin, dispersed throughout the nucleus • During mitosis, chromatin further compact into visible metaphase chromosomes which can be visualized under a microscope

  49. Package of DNA in Microorganisms • In viruses, genomic DNA molecule is associated with protein molecules and packaged inside the viral capsids. In bacteria and fungi, the genomic DNA is associated with proteins and is packaged as a compact mass inside the center of the cell. It is called as “nucleoid”

  50. Electronmicrographs of Extended and Condensed Chromatin Condensed form Extended form • Nucleosomes: Chromatin isolated from nucleus under low salt and no divalent cation (Mg+2), the isolated chromatin resembles “beads on a string”. The beads are termed nucleosomes and the string termed linker • Nucleosome is about 10 nm in diameter and is the primary structural unit of chromatin