slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Genome organization & its genetic implications PowerPoint Presentation
Download Presentation
Genome organization & its genetic implications

Loading in 2 Seconds...

play fullscreen
1 / 43

Genome organization & its genetic implications - PowerPoint PPT Presentation


  • 121 Views
  • Uploaded on

Genome organization & its genetic implications. Lander , ES (2011) Initial impact of the sequencing of the human genome. Nature 470:187 Feuillet, C, JE Leach, J Rogers, PS Schnable , K Eersole (2011) Crop genome sequencing: lessons and rationales. Trendt Plant Sci 16:77.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Genome organization & its genetic implications' - lerato


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Genome organization

&

its genetic implications

Lander , ES (2011) Initial impact of the sequencing of the human genome. Nature 470:187

Feuillet, C, JE Leach, J Rogers, PS Schnable, K Eersole (2011) Crop genome sequencing: lessons and rationales. Trendt Plant Sci 16:77

slide2

DNA sequencing technologies

Metzker, M (2010) Sequencing technologies – the next generation. Nature Rev Genet 11:31

slide3

What are the challenges for the correct assembly of genome sequence information?

  • Genome size
    • Eukaryotic genomes ~ 109 – 1010bp
  • Genome composition
    • Eukaryotic genomes ~ 50 % repetitive DNA
slide4

Genome size – the C-value paradox

genome size in basepairs

slide5

Genome Size – the C value paradox:

The amount of DNA in the haploid cell of an organism is not related to its evolutionary complexity or number of genes

slide6

Genome composition

  • Complexity = length in nucleotides of longest non- repeating sequence that can be formed by splicing together all unique sequence in a sample
  • Eukaryotic genomes contain different classes of DNA based on sequence complexity:
        • highly repetitive
        • middle repetitive
        • unique
genome composition dna re association kinetics
Genome composition – DNA re-association kinetics

complexity in

[moles of nucleotide / liter] x sec

slide8

Genome composition - DNA re-association kinetics for a complex eukaryotic genome

highly repetitive sequences

middle repetitive sequences

single copy

sequences

[moles of nucleotide / liter] x sec

slide9

From genome composition to genome organization

  • How are unique, middle repetitive and highly repetitive sequences organized in the genome?
slide10

Genome organization

E. coli

S. cerevisiae

H. sapiens

gene island

Z. mays

gene desert

= Gene

= Repeat

slide11

Genetic complexity

  • Eukaryotic genomes contain ~ 20,000 – 30,000 genes
  • 30% of protein coding genes are members of gene families
      • duplication & divergence of sequence & gene
      • function
slide12

Gene complexity

  • What does a gene look like from a sequence or transcript perspective?
        • no “typical gene”
  • Introns and exons
        • introns can be numerous and long, i.e. some genes are more intron than exon!
        • alternative splicing variants are common
  • Not all genes encode proteins
        • non-coding structural RNAs (e.g. rRNA, tRNA, snRNA, snoRNA)
        • non-coding regulatory RNAs (e.g. miRNA, lncRNA)
slide13

Implications of gene and genetic complexity

  • Forward genetics: Have mutant – want gene
  • Via map-based cloning:
      • Map your mutation
      • Look at the genome sequence in the map interval to identify candidate genes
  • Candidate gene identification may not be trivial, even with good genome annotation!
    • Especially an issue for plant genome sequences – only arabidopsis and rice are considered “finished” quality
  • Note further genetic tests required, even if the perfect candidate is identified.
slide14

Gene identification - open reading frames

5'atgcccaagctgaatagcgtagaggggttttcatcatga

frame 1

atgcccaagctgaatagcgta gag gggttttcatcataa

M   P   K   L   N   S   V   E   G   F S S *

frame 2  

tgcccaagctgaatagcg tag aggggtttt cat cattgg

C   P   S   *   I   A   *   R   G   F H H

  • How to tell real orfs from random chance orfs?
slide15

Gene identification - short orfs can be translated!

  • e.g. the drosophila tarsal-less gene

Galindo et al.

PLoSBiol 5(5): e106 doi:10.1371/journal.pbio.0050106

slide16

Gene identification – database searching

e.g. http://blast.ncbi.nlm.nih.gov/Blast.cgi

slide17

Gene identification – shared

synteny

Preserved localization of genes on chromosomes of different species

e.g. mouse chromosome 11 and parts of 5 different human chromosomes

Perfect correspondence in order, orientation and spacing of 23 putative genes, and 245 conserved sequence blocks in noncoding regions

Caution! Even regions of high synteny may not show perfect gene-for-gene correspondence

from Gibson & Muse (2002) A Primer of Genome Science,Sinauer Inc.

slide18

Gene identification – shared synteny

Preserved localization of genes on chromosomes of different species

e.g. maize – sorghum (G) -

rice (H)

Schnable et al.

Science 326:1112

slide19

Gene identification – promoter elements

  • TATA – box elements
      • 5'-TATAAA-3' or variant
      • plant and animal promoters
  • CpG islands
      • Regions of higher than expected CpGdinucleotidecontent, un-methlylated in active promoters
      • ~ 40% of mammalian promoters
      • ~ 70% of human promoters
      • but NOT in plant promoter regions
  • Y patch (pyrimidine-rich patch)
      • plant not mammalian promoters
slide20

Gene identification – introns & exons

  • Long gene space more intron than exon
  • Extreme example - human clotting factor VIII gene
slide23

Gene identification – non-coding RNAs

  • non-coding structural RNAs
      • rRNA & tRNA – transcription & translation
      • snoRNA – small nucleolar RNAs
        • guide chemical modification of rRNAs & tRNAs
      • snRNA – small nuclear RNAs
        • guide splicing reactions
  • non-coding regulatory RNAs
      • miRNA & siRNA - small interfering RNAs
        • RNAi pathway
      • lncRNA - long noncoding RNAs
slide24

Origins of long non-coding RNAs

  • Overlapping transcriptional architecture
    • e.g. the human phosphatidylserinedecarboxylase (PISD) gene

Kapranov, Nature Rev Genet 8:413

slide25

Functions of lncRNAs

Wilusz et al. Genes Dev. 23: 1494–1504

slide26

Genome -Transcriptome - Proteome

  • Genome
      • Full complement of an organism’s hereditary information
  • Transcriptome
      • Full set of RNA molecules, coding and non-coding, transcribed from the genome
  • Proteome
      • Full set of proteins expressed from a genome
  • Not a 1:1:1 correspondence
slide27

Implications of gene and genetic complexity

  • What is the take-home message for forward genetics?
slide28

Implications of gene and genetic complexity

  • Reverse genetics: Have gene – want phenotype
    • Predict phenotypes based on gene function in other organisms
    • Knock out or knock down your gene of interest & look for corresponding changes in phenotype
slide29

Gene families

  • Gene duplication followed by:
    • Duplication of gene function
    • Divergence of gene function
    • Loss of gene function leading to a pseudogene
  • e.g. human
    • globin gene
    • family
slide30

Gene families

  • Gene duplication followed by:
    • Duplication of gene function
    • Divergence of gene function
    • Loss of gene function leading to a pseudogene
  • e.g. human beta-globin gene cluster
    • chromosome 11
    • Five functional genes and two pseudogenes
slide31

Gene families – paralogs & orthologs

  • Homologs
    • Protein or DNA sequences having shared ancestry
  • Orthologs
    • Homologs created by a speciation event
    • May or may not retain the same function!
  • Paralogs
    • Homologs created by a gene duplication event
    • May or may not retain the same function!
  • It is not always easy or possible to distinguish orthologs from paralogs when comparing genes or proteins between species
slide33

Gene families – paralogs & orthologs

orthologs

paralogs

orthologs

orthologs

Storz et al. IUBMB Life 63:313

slide34

Implications of gene and genetic complexity

  • What are the implications of gene families for forward genetics (i.e. looking for candidate genes that condition a mutant phenotype?)
  • What are the implications of gene families for reverse genetics (i.e. altering gene function and looking for a phenotype)?
slide35

Genome organization – repeated sequences ~ 50% of the genome

  • Segmental duplications and copy number variation
  • Tandemly repeated genes
    • rRNA, tRNA and histone gene products needed in large amounts
  • Duplicated gene families
  • Transposons
  • Tandem simple sequence repeats
    • centromeric & telomeric repeats
    • minisatellites
    • microsatellites
slide36

Repeated sequences – segmental duplications & copy number variants

  • Segmental duplications
    • > 1 kb block of duplicated sequence with > 90% sequence identity
    • recombine to mediate further copy number variants

Koszul & Fischer, C.R. Biologies 332:254

slide38

Repeated sequences – segmental duplications & copy number variants

  • Copy number variant (CNV)
    • Deviation from diploid
    • copy number at a locus
  • Copy number polymorphism (CNP)
    • CNV present in >1% of a
    • population
  • Recent association with human developmental syndromes

Girirajan et al. Annu Rev Genet 45:203

slide39

Transposon-derived repeated sequences

  • ~ 45% of human & 85% of maize genome
slide40

Transposon-derived repeated sequences

  • Many are truncated & inactive
  • Considered to be important in the
    • evolution of genome organization
    • & function

Gogvadze & Buzdin

Cell Mol Life Sci 66:3727

slide41

Repeated sequences – short tandem repeats

  • Centromeric
    • Long array (~100,000 bp) of short tandemrepeats
      • ~ 5bp drosophila, ~150 bp maize, ~170 bp human
    • not conserved across species
    • in some cases not even conserved in all chromosomes of the same species
    • Association with a centromere-specific histone H3
  • Telomeric
    • Length varies between species
      • ~ 300 base pairs - 150 kilobasepairs
    • Conserved, G-rich repeat sequence
      • vertebrates TTAGGG ; most plants TTTAGGG
slide42

Repeated sequences – short tandem repeats

  • Minisatellites (Variable number tandem repeats, VNTRs)
    • 10-100 bp repeat units
    • 500-30,000 bp arrays
    • The original DNA fingerprinting marker via Southern blotting
        • Now supplanted by microsatellites
slide43

variety A

variety B

[CACACACA]

[CACA]

[GTGTGTGT]

[GTGT]

  • Repeated sequences – short tandem repeats
  • Microsatellites (Simple sequence repeats, SSRs)
    • Di, tri or tetra-nucleotide repeats; 1-10 repeat units per locus
    • Repeat numbers expand or contract over a short evolutionary, or even generational time-frame
    • Amplified by PCR
      • Primers based on unique flanking sequence
      • Products fractionated by capillary or acrylamide gel electrophoresis
    • Co-dominant mapping & fingerprinting markers
      • Both alleles can be detected in a heterozygous individual