slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu PowerPoint Presentation
Download Presentation
MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu

Loading in 2 Seconds...

play fullscreen
1 / 60

MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu - PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on

CS173. Lecture 17: Genome-phenotype relationships. MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu. Announcements. Projects:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu' - psyche


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

CS173

Lecture 17: Genome-phenotype relationships

MW  11:00-12:15 in Beckman B302

Prof: Gill Bejerano

TAs: Jim Notwell & Harendra Guturu

http://cs173.stanford.edu [BejeranoWinter12/13]

announcements
Announcements
  • Projects:
    • The requirement for each group is a PowerPoint presentation (between 10-12 minutes so we can accommodate all of the groups).We also ask that each group submits its commented source code. No write-up is required.Include a brief (~ half page) summary of what was accomplished and how problems from the milestone were resolved, along with what each member of the group contributed to the project.

http://cs173.stanford.edu [BejeranoWinter12/13]

what makes us molecularly human
What makes us molecularly human?

… Searching

Far

http://cs173.stanford.edu [BejeranoWinter12/13]

metazoans multi cellular organisms
Metazoans (multi-cellular organisms)

 you are here

[Human Molecular Genetics, 3rd Edition]

http://cs173.stanford.edu [BejeranoWinter12/13]

ancient origins of important gene families
Ancient Origins of Important Gene Families

http://cs173.stanford.edu [BejeranoWinter12/13]

signaling centers in the vertebrate brain
Signaling centers in the vertebrate brain
  • Comparison of key gene expression patterns between vertebrates and very distantly related species reveal striking homologies:

http://cs173.stanford.edu [BejeranoWinter12/13]

ancient regulatory circuits
Ancient Regulatory Circuits

http://cs173.stanford.edu [BejeranoWinter12/13]

the first human enhancers conserved to protostomes

The first human enhancers conserved to protostomes

http://cs173.stanford.edu [BejeranoWinter12/13]

what makes us molecularly human1
What makes us molecularly human?

Searching

Near …

http://cs173.stanford.edu [BejeranoWinter12/13]

why compare to chimp
Why compare to Chimp?

http://cs173.stanford.edu [BejeranoWinter12/13]

slide12

Genetic basis of human phenotypes?

Phenotype

Genotype

Number of rearrangements

Most mutationsare near/neutral.

http://cs173.stanford.edu [BejeranoWinter12/13]

candidate genes for human specific evolution
Candidate genes for human specific evolution

...

http://cs173.stanford.edu [BejeranoWinter12/13]

different unbiased search loss vs gain
Different Unbiased Search: Loss vs Gain

Human Accelerated Regions

rapid change

Human

  • 4-18 unique human substitutions
  • Pollard, K. et al., Nature, 2006
  • Prabhakar, S. et al., Science, 2008

Chimp

conserved

Human Conserved Sequence Deletions

(hCONDELs)

deleted!

Human

  • Complete human loss of sequence
  • Likely to confer human-specific phenotypes

Chimp

[McLean, Reno, Pollen et al., Nature, 2011]

conserved

http://cs173.stanford.edu [BejeranoWinter12/13]

what makes us human now
What makes us human now?

http://cs173.stanford.edu [BejeranoWinter12/13]

slide16

Reconstructing multiple

related histories

http://cs173.stanford.edu [BejeranoWinter12/13]

from pairwise to multiple alignments
From pairwise to multiple alignments

http://cs173.stanford.edu [BejeranoWinter12/13]

multidimensional dp
Multidimensional DP
  • Example: in 3D (three sequences):
  • 7 neighbors/cell

F(i,j,k) = max{ F(i-1,j-1,k-1)+S(xi, xj, xk),

F(i-1,j-1,k )+S(xi, xj, - ),

F(i-1,j ,k-1)+S(xi, -, xk),

F(i-1,j ,k )+S(xi, -, - ),

F(i ,j-1,k-1)+S( -, xj, xk),

F(i ,j-1,k )+S( -, xj, xk),

F(i ,j ,k-1)+S( -, -, xk) }

progressive alignment
Progressive Alignment

x

pxy

y

  • When evolutionary tree is known:
    • Align closest first, in the order of the tree
    • In each step, align two sequences x, y, or profiles px, py, to generate a new alignment with associated profile presult

z

pxyzw

pzw

w

E.g: Blastz – Multiz shown in UCSC browser

anchor based alignment
Anchor based alignment

Example:

http://cs173.stanford.edu [BejeranoWinter12/13]

anchor based alignment1
Anchor based alignment

E.g: Enredo - Pecan shown in ENSEMBL browser

http://cs173.stanford.edu [BejeranoWinter12/13]

slide22

Ancestral Genome Reconstruction

  • Given: - Genomic sequences of several mammals
    • - Phylogenetic tree
  • Find: The genomic sequence of all their ancestors

ARMADILLO TGCTACTAATATTTAGTACATAGAGCCCAGGGGTGCTGCTGAAAGTCTTAAAATGCACAGTGTAGCCCCTCCTCC

COW GCCTCTCTTTCTGCCCTGCAGGCTAGAATGTATCACTTAGATGTTCCAAATCAGAAAGTGTTCAGCCATTTCCATACC

HORSE GTCACAATTTAGGAAGTGCCACTGGCCTCTAGAGGGTAGAAGACAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCC

CAT GTCACAGTTTAGGGGGTACTACTGGCATCTATCGGGTGGAGGATAGGGATACTGATAATCATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCC

DOG GTCACAATTTGGGGGATACTACTGGCATCTAATGGGTAGAGGACAGGGATACTGATAATTGCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCC

HEDGEHOG GTCATAGTTTGATTATATGGGCTTCTTAGTAGACAAAGAAAAAGATGTTCTGGTAGTCATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTC

MOUSE GTCACAGTTTGGAGGATGTTACTGACATCTAGAGAGTAGACTTTAAAGATACTGATAGTCACCCCATTGTGCACCTCC

RAT GTCACAATTTGGAGGATGTTACTGGCATCTAGAGAGTAGACTTTAAGGACACTGATAATCATACTATGCTGCACTTCC

RABBIT ATCACAATTTGGGGAACACCACTGGCATCTCGGGTAGCAGGCCAGGCATGCTGGTAATTATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACC

LEMUR ATCACAATTGGGGGTGCCACGGTCCTCCAGTGGGTAGAGAACAGGGAGGCTGATAACCACCCTGCAGTGCACAGGGCAGTGCCCCACTCCCACCAC

MOUSE-LEMUR ATCACAGTTGGGGGATGCCACTGGCCTCAAGTGGGTAGAGAACAGGGAGGCTGAAAACCACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCC

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTCTACTTGGGTAGAGAAACAGGGATGCTTATAATCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTCTACTTGGGTAGAGAAACAGGAATGCTTATAATCATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCC

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTCTACTTGGGTAGAAAAACAGGGATGCTTATAATCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTCGACTTGGGTAGAGAAGCGGGGATGCTTATAATCATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCC

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTCAACTTGGGTAGAGAAGTGGGGATGCTTATACTCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTCAACTTGGGTAGAGAAGCGGGGATGCTTATAATCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTCAACTTGGGTAGAGAAGCGGGGATGCTTATAATCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC

All of it: Functional,

non-functional, introns,

intergenic, repeats,

everything*!

  • Mutational operations
  • Small-scale : Substitutions, deletions, insertions (inc. transposons)
  • Large scale: Genome rearrangement, segmental/tandem duplications
  • (*): Heterochromatin non-included
reconstruction algorithm
Reconstruction algorithm
  • Identify orthologous regions in each species
slide24

Reconstruction algorithm

  • 2) Compute multiple genome alignment

ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG

COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA

HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA

CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA

DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA

HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA

MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG

RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG

RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG

LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG

MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG

  • Goal: Phylogenetic correctness
    • Two nucleotides are aligned if and only if they have a common ancestor.
slide25

Reconstruction algorithm

  • 3) Reconstruct insertion/deletion history
    • Find most likely explanation for gaps observed

ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG

COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA

HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA

CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA

DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA

HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA

MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG

RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG

RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG

LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG

MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG

slide26

Reconstruction algorithm

  • 3) Reconstruct insertion/deletion history
    • Find most likely explanation for gaps observed

ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG

COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA

HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA

CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA

DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA

HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA

MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG

RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG

RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG

LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG

MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG

slide27

Reconstruction algorithm

  • 3) Reconstruct insertion/deletion history
    • Find most likely explanation for gaps observed
    • This defines the presence/absence of a base at each position of each ancestor

ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG

COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA

HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA

CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA

DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA

HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA

MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG

RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG

RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG

LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG

MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG

NNNNNNNNNNNNNNNNNNNNNNNNNNNN-----N-NNNNN-NNNNNNN-NN-NNNNNNNNNNNNNNNNN----------NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

slide28

Reconstruction algorithm

  • 4) Infer max.-like. nucleotide at each position
  • Ancestral sequences are inferred!

ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG

COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA

HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA

CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA

DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA

HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA

MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG

RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG

RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG

LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG

MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG

GTCACAATTTGGGGGATGCTACTGGCAT-----C-TAGTG-GGTAGAG-AA-CAGGGATGCTGATAATC----------ATCCTACAGTGCACAGGACAGTGCCCCCACCCCCACTCCAACAACAAAGAATTATCCGGCCCAAAATGCCAATA--------GT--GCCCAGG

how to understand sequence changes
How to understand sequence changes?
  • Linking Genotype and Phenotype evolution
    • G->P
    • P->G

http://cs173.stanford.edu [BejeranoWinter12/13]

the genotype phenotype divide
The Genotype - Phenotype divide

Can we could find evolutionary patterns that are distinct enough to be phenotypically revealing?

Problem #1:

Too many nucleotide changes between any pair of related species (or individuals).

The vast majority of these are near/neutral.

Species A

Species B

http://cs173.stanford.edu [BejeranoWinter12/13]

genotype phenotype screens
Genotype -> Phenotype screens

Define a “dramatic” (non-neutral) genomic scenario:

deleted!

Human

Chimp

conserved

Problem #2:

What is the phenotype?

hCONDEL

[McLean et al, 2011]

http://cs173.stanford.edu [BejeranoWinter12/13]

testing is a humbling experience
Testing is a humbling experience

“Wild rides”: often not what we expected, often not what we can understand.

http://cs173.stanford.edu [BejeranoWinter12/13]

what about a tree of related species
What about a tree of related species?

What if we could find evolutionary patterns that were distinct enough to be phenotypically revealing?

Species A

Genomes:

Inherited with Modifications.

Traits:

Come and Go.

ancestor

Species H

http://cs173.stanford.edu [BejeranoWinter12/13]

what happens when an ancestral trait goes
What happens when an ancestral trait “goes”?

ancestral trait information

ancestor

Trait information is no longer under selection

Phenotype

Genome

Erodes away over evolutionary time

http://cs173.stanford.edu [BejeranoWinter12/13]

slide36

ancestral trait information

A lot of DNA and many traitsvary between any two species.

ancestor

Trait information is no longer under selection

Phenotype

Genome

Erodes away over evolutionary time

http://cs173.stanford.edu [BejeranoWinter12/13]

slide37

ancestral trait information

A lot of DNA and many traitsvary between any two species.

What about independent trait loss?

vitamin C synthesis, tail, body hair,dentition features, etc. etc.

ancestor

Trait information is no longer under selection

Phenotype

Genome

Erodes away over evolutionary time

http://cs173.stanford.edu [BejeranoWinter12/13]

slide38

ancestral trait information

ancestor

Trait information is no longer under selection

Phenotype

Genome

Erodes away over evolutionary time

http://cs173.stanford.edu [BejeranoWinter12/13]

slide39

ancestral trait information

Different disabling mutation.

Different disabling times.

ancestor

Trait information is no longer under selection

Phenotype

Genome

Erodes away over evolutionary time

http://cs173.stanford.edu [BejeranoWinter12/13]

the p g screen
The P->G screen

matches trait presence/absence pattern

http://cs173.stanford.edu [BejeranoWinter12/13]

[Hiller et al., 2012a]

slide41

Branding ;-)

Forward Genetics:

search for mutations that segregate with the trait

Forward Genomics:

search for regions that are only lost in species lacking the trait

phenotype

genotype

http://cs173.stanford.edu [BejeranoWinter12/13]

vitamin c synthesis has been measured in many species
Vitamin C synthesis has been measured in many species

human

mouse

synthesizes vitamin C

cannot synthesize vitamin C

http://cs173.stanford.edu [BejeranoWinter12/13]

example the vitamin c synthesis phenotree
Example: The Vitamin C synthesis “phenotree”

loss of vitamin C synthesis

happened 4 times independently

in mammalian evolution

http://cs173.stanford.edu [BejeranoWinter12/13]

we compute percent identity values for all conserved regions for all species
We compute percent identity values for all conserved regions for all species

544,549 conserved regions

93%

70%

85%

...

matrix:33 species x544,549 regions

  • Reconstruct ancestral sequence
  • Measure extant species divergence
  • Beware of
    • Low quality sequence
    • Assembly gaps
  • Seek perfect phenotree match

http://cs173.stanford.edu [BejeranoWinter12/13]

slide45
We quantify the match to the vitamin C pattern by counting the number of species that violate the pattern

Percent identity

Percent identity

0

100

0

100

1 violation

2 violations

http://cs173.stanford.edu [BejeranoWinter12/13]

regions matching the vitamin c trait are clustered
Regions matching the vitamin C trait are clustered

perfect

match

544,549 conserved regions

0

1

2

3

4

no. of violating species

5

6

7

8

9

10

no

match

 these conserved regions are all exons of a single gene

http://cs173.stanford.edu [BejeranoWinter12/13]

this gene is more diverged in all non vitamin c synthesizing species
This gene is more divergedin all non-vitamin C synthesizing species

http://cs173.stanford.edu [BejeranoWinter12/13]

what is the function of this gene
What is the function of this gene ?

33 genomes X 544,549 regions

Vitamin C

pattern

Gulo - gulonolactone (L-) oxidase

encodes the enzyme responsible for vitamin C biosynthesis

Note: no likely shared disabling mutation.

Forward genomics works.

Can it work for continuous traits?

With only two losses?

And many unknown values?

http://cs173.stanford.edu [BejeranoWinter12/13]

find cure models
Find “Cure” Models

Continuous measure of key circulating molecule:

http://cs173.stanford.edu [BejeranoWinter12/13]

find cure models1
Find “Cure” Models

Continuous measure of key circulating molecule. Single out 2 lowest values.

Find perfect match in a transporter gene for said molecule.

http://cs173.stanford.edu [BejeranoWinter12/13]

find cure models2
Find “Cure” Models

Human ABCB4 mutations lower to guinea pig levels but are detrimental.

Our discovery: Guinea pig and horse gene inactivated in natural state. How?

create KO gene

Natural KO

try to fix/treat

find nature’s fix/treat

http://cs173.stanford.edu [BejeranoWinter12/13]

slide52

Forward Genomics Extensions

  • We used simulation
    • Our discoveries are not serendipitous
    • More losses, more branch length => more likely
  • We extended our screen to non-coding DNA
    • We find hundreds of independently lost enhancers
    • We show they are likely less pleiotropic
  • We surveyed phenotypes
    • 1/3 of scored traits in 3 large screens are independently lost

[Hiller et al., 2012a]

[Hiller et al., 2012b]

http://cs173.stanford.edu [BejeranoWinter12/13]

slide53

We’re done!

http://cs173.stanford.edu [BejeranoWinter12/13]

what did we do together i
What did we do together? I
  • Genome content:
  • Protein coding genes
  • RNA genes
  • Gene regulation: TFs, genomic elements, chromatin, signaling
  • Repeats
  • Technology:
  • Genome sequencing, technology dependence
  • Genome Evolution:
  • Evolution = Mutation + Selection
  • Locus evolution: Neutral, Purifying, Positive
  • Comparative Genomics
  • Chains & Nets

http://cs173.stanford.edu [BejeranoWinter12/13]

what did we do together ii
What did we do together? II
  • Genome-phenotype relationships:
  • Neutral: human/species variation
  • Purifying: Human disease, personal genomics
  • Positive: recent human evolution
  • Shared origins: antiquity
  • Co-oevolution of genome and phenotype
  • Evolutionary developmental Biology
  • Primers:
  • Biology: from genome to organism
  • UCSC browser
  • Text processing
  • Computational challenges:
  • Dozens if not hundreds...

http://cs173.stanford.edu [BejeranoWinter12/13]

slide56

What next?

http://cs173.stanford.edu [BejeranoWinter12/13]

computational genomics
Computational Genomics

http://cs173.stanford.edu [BejeranoWinter12/13]

population genetics
Population Genetics

http://cs173.stanford.edu [BejeranoWinter12/13]

other genomics classes in spring qtr
Other Genomics Classes in Spring Qtr

http://cs173.stanford.edu [BejeranoWinter12/13]

slide60

The END

http://cs173.stanford.edu [BejeranoWinter12/13]