orthology predictions for whole mammalian genomes n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Orthology predictions for whole mammalian genomes PowerPoint Presentation
Download Presentation
Orthology predictions for whole mammalian genomes

Loading in 2 Seconds...

play fullscreen
1 / 94

Orthology predictions for whole mammalian genomes - PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on

Orthology predictions for whole mammalian genomes. Leo Goodstadt MRC Functional Genomics Unit Oxford University. Mammalian Genomes. How does our genome, and how do our genes, differ from those of other mammals and other vertebrates?. Great Expectations.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Orthology predictions for whole mammalian genomes' - hayley


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
orthology predictions for whole mammalian genomes

Orthology predictions for whole mammalian genomes

Leo Goodstadt

MRC Functional Genomics Unit

Oxford University

mammalian genomes
Mammalian Genomes

How does our genome, and how do our genes, differ from those of other mammals and other vertebrates?

so why is it taking so long to understand a simple genome
We did not appreciate how much functional sequence there would be.

We did not appreciate how hard it would be to ‘read off’ functions from the human genome.

We had no idea that individual human genomes can differ so much!

So why is it taking so long to understand a simple genome
  • How much?
  • Species-specific genes?
  • Human genomes
how do we find function in the genome
How do we find function in the genome?
  • Nothing in Biology Makes Sense Except in the Light of Evolution. Theodosius Dobzhansky (1900-1975).
mouse human orthologues identity
Mouse-Human Orthologues % Identity
  • sites not in domains: 64.4%
  • cSNP sites: 67.1%
  • all sites: 70.1%
  • sites in domains: 88.9%
  • disease sites: 90.3%
large number of lineage specific duplications
Large number of lineage specific duplications

10 – 20% of genes are lineage specific depending on comparisons

20 of human genes have been duplicated or do not have a rodent orthologue
20% of human genes have been duplicated or do not have a rodent orthologue

Family trees for genes:

Human specific genes

missing from mouse.(In many cases, more distantly related mouse gene (homologues) can be found. (8%)

1 to 1

(80%)

Gene families shared with mouse but which have expanded in human (9%)

Shared Orthologues

(present as a single gene

in the common ancestor

to human and mouse)

where do new genes come from
Where do new genes come from?
  • De novo (from non-coding)
  • Rapid sequence change
  • Gene duplication

M. Lynch and A. Force , The probability of duplicate gene preservation by subfunctionalisation. Genetics 154 (2000), pp. 459–473

y

  • Pseudogenisation
  • Missing: Horizontal Gene transfer
inparalogues
Inparalogues

Chemosensation(OR, V1R and V2R )

Reproduction(Vomeronasal Receptors, lipocalins, b-microseminoprotein (12:1))

Immunity(IG chains, butyrophilins, leukocyte IG-like receptors, T-cell receptor chains and carcinoembryonic antigen-related cell adhesion molecules )pancreatic RNAses

Detoxification(hypoxanthine phosphoribosyltransferase homologues nitrogen poor diets)

KRAB ZnFingers

slide19

No. in

cluster

Reproduction Clusters

rapid evolvers in protein coding genes
Rapid evolvers in protein coding genes

Reproduction

Chemosensation

KRAB Zn Fingers

Immunity

TOXIN

DEGRADATION

hypothesis darwinian evolution
Hypothesis: Darwinian evolution

Competition:

  • Inter-specific (pathogens, predators)
  • Intra-specific
    • mating
    • sub-speciation / kin-selection
    • gender conflict
    • clonal expansions in sperm
rapidly changing developmental or transcriptional regulatory genes
KRAB-zinc finger genes

Cancer-testis antigen genes (e.g. PRAMEs)

Regulate chromatin structure and therefore the timing of transcription.

Rapidly-changing developmental or transcriptional regulatory genes?
detecting biological signals among inparalogues
Detecting biological signals among inparalogues

Correlations with known annotations

  • Biological Annotations (gene descriptions / Gene Ontology)
  • Tissue specificity
  • Comparative changes across lineages (dating)
  • Chromosomal Distribution
  • Positive selection
  • Genomic environment
different genes duplicate at different times
Different genes duplicate at different times

LeoGoodstadt et al. Genome Res. 2007; 17: 969-981

trends functions
Trends - Functions

Human - Chicken

GCSC (2005)‏

trends tissues
Trends - Tissues

Chicken - Human

CGSC (2005)‏

positive selection prame genes
Positive selection: PRAME genes

Amino acid sites under positive selection in human (red), mouse (blue) and rat (purple) [or multiple species (yellow)] PRAME genes.

gene duplication remodels genome
Gene Duplication Remodels Genome

Androgen-binding proteins. produced by sertoli cells in testes seminiferous tubulesEmes et al. (2004) Genome Res. 14(8):1516-29

slide36

VR2 olfactory receptor N-terminal domain:

 sites: dark blue, ligand (glutamate) pink

(other monomer)

slide37

MHC class1b,

M10s

 sites :

in blue,

peptide ligand in MHC

structure in green

few mendelian disease genes lack mouse orthologues
Few Mendelian disease genes lack mouse orthologues
  • Kallmann syndrome geneC. elegans orthologue.
  • CETP - cholesteryl ester transfer proteinRabbit and Hamster
  • Glycophorin EPrimate specificMN and Ss blood types
slide40

Mouse equivalents of human disease variants

Hs normal: MAETLFWTPLLVVLLAGLGDTEAQQTTLHPLVGRVFVHTLDHETFLSLPEHVAVPPAVHI

Hs variant: MAETLFWTPLLVVLLAGLGDTEAQQTTLHLLVGRVFVHTLDHETFLSLPEHVAVPPAVHI

Mm normal: MAAAVTWIPLLAGLLAGLRDTKAQQTTLHLLVGRVFVHPLEHATFLRLPEHVAVPPTVRL

Nick Dickens & Jörg Schultz

disease mutations do not always lead to pathological phenotypes in mouse
Disease mutations do not always lead to pathological phenotypes in mouse!

7293 SwissProt disease-associated variants

  • 90.3% mouse residue = human wild-type residue
  • 7.5% mouse residue ≠ human wild-type residue
  • 2.2%mouse residue = human disease residue
comparisons with a third genome
Comparisons with a third genome
  • Australian marsupial silver-gray bushtail possumTrichosurus vulpecula
  • 8,237 orthologues from 111,634 ESTs
  • More closely related to Monodelphis Median dS:
homo monodelphis 1 1 orthologues
Homo Monodelphis 1:1 orthologues

/

d

d

N

S

0.086

1.02

d

S

Amino acid sequence identity

81.0%

94.2%

Pairwise alignment coverage

Homo sapiens

Monodelphis

domestica

Number of exons

9

9

Sequence length (codons)

471

445

Unspliced transcript length (bp)

27

,

241

25

,

365

G+C content at 4D sites

56.9%

48.7%

lower g c in homo x
Lower G+C in Homo X

Decreased G+C

variations in female recombination rate
Variations in Female recombination rate
  • Telomeric ends are highly recombining
    • Short chromosomes have proportionally more subtelomeric sequence
    • Long chromosomes have proportionally more interstitial sequence
  • Obligatory chiasma per chromosome
biased gene conversion during recombination
Biased Gene Conversion during Recombination

Galtier, N. et al. Genetics 2001;159:907-911

consequences of recombination
Consequences of Recombination

Biased Gene Conversion

  • High G + C
  • High dS

In subtelomeres and X chromosomes

consequences of recombination1
Consequences of Recombination
  • Increased selection efficiency (disrupt linkage between neighbouring mutations: “Hill Robertson” effect)
  • Most genes are under purifying selection (dN/dS ~ 0.086)
  • Highly recombining regions predicted to have lower dN/dS
summary
Summary

X chromosome / Subtelomeric regions are:

  • Highly recombining in Monodelphis females
  • Have high G+C
  • High dS
  • Increased purifying selection
  • Short intron lengths
consequences
Consequences
  • Some lineages have highly rearranged karyotypes
  • Chromosome breakage highly correlated
  • Rearrangements correlated with G+C content
  • Gene function is not independent of chromosomal location
  • Many “high evolvers” may be under relaxed selection
  • Mutation rate variation has consequences for finding disease genes
are functionally linked genes genomic neighbours
Are functionally-linked genes genomic neighbours?
  • Interacting Proteins: Very small effect mostly arising from local gene duplication
  • Co-expressed genes: Small effect mostly arising from local gene duplication
  • ‘Housekeeping genes’: Small effect with unclear biological significance
  • So, function is not able to be clearly ‘read off’ the genome
evolutionary rate analyses in clades
Evolutionary rate analyses in clades
  • Comparative genomics: two or few genomes
    • highlights differences
  • Clade genomics: sets of genomes
    • highlights innovations
  • Examples
    • 12 flies
    • 5 mammals + chicken
    • 4 worms
analysis pipeline
Analysis pipeline
  • Gene prediction
  • Assignment of orthology and paralogy
  • Rate analysis
beware of naive comparisons of gene duplications
Beware of naive comparisons of Gene Duplications

Mouse has few recent duplications than rat?

Number of Nodes

KS distance from current time

extension to other clades
Extension to other clades
  • 12 Flies
    • D. melanogaster, D. simulans, D. sechellia, D. yakuba, D. erecta, D. ananassae, D.pseudoobscura, D. persimilis, D. willistoni, D. grimshawi, D. mojavensis, D. virilis
  • 6 Amniotes
    • Human, mouse, dog, opossum, platypus, chicken
  • 4 Nematodes
    • C. elegans, C. briggsae, C. remanei, C. 2801
extension to other clades1
Extension to other clades

Nematodes

Drosophila

Amniotes

lineage specific d n d s reflects populations size
Lineage specific dN/dS reflects populations size?

Drosophila

0.14

Nematodes

Amniotes

0.12

S

/d

0.10

N

d

0.08

Lineage specific

0.06

0.04

0.02

0.00

dvir

cbri

dgri

cele

ggal

dere

dmel

dsim

dmoj

dyak

dpse

hsap

dana

cfam

oana

crem

c2801

mmus

mdom

Species

most human duplications are recent some before the chimp human split most after
Most Human Duplications are Recent:Some before the Chimp-Human Split, Most After

Polymorphisms?

Hominin-specific genes?

unfixed genomic structural variants explain population differences
Unfixed genomic structural variants explain population differences?

Differences in the number of copies of a gene

Copy Number / Structural Variation

Tuzun et al. Nature Genetics 2005

luckily humans have a relatively low polymorphism rate
(Luckily!) humans have a relatively low polymorphism rate

KAESSMANN, H. & PÄÄBO, S. The genetical history of humans and the great apes.Journal of Internal Medicine  251 (1), 1-18.

structural variation and disease
Structural variation and disease
  • > 12% of human genome is structurally variable represent more DNA than SNPs!
  • more likely to be disease-associated than SNPs
  • Structural variants are often complex, including changes in regulation
how can we find causative differences
How can we find causative differences?

Look at annotations / evolutionary history (between species and in the population)of corresponding genes (Orthology!)

how can we find causative differences1
How can we find causative differences?

Over- /Under- representations of

  • Disease (Rare and common alleles)
  • GO annotations
  • Pathways, protein-protein interactions
  • Domain structure
  • Sequence conservation / divergence:indels, base changes (SNPs), rearrangements
  • GC
  • Tissue specificity
  • Duplication history
2 new genes associated with coronary artery disease
2 new genes associated with Coronary artery disease

The Wellcome Trust Case Control Consortium:

500,000 SNPs

7 x 2000 samples per disease

3000 controls

Chromosome 9 (Mb)

exercise
Exercise:
  • Brain Anatomy presumably evolved step by step
  • How did neural anatomy evolve in vertebrates / mammals / primates?
  • Which neural anatomical structure correspond?
exercise1
Exercise:

The future:

  • Find brain genes in other species groups which have shown apparent increase in “brain power” (e.g. song birds, dolphins)
  • Increases in brain capacity may involve convergent evolution
  • See if same trends also visible in our lineage
exercise2
Exercise:

Current techniques:

  • Homology by “inspection” (phenotypical comparisons of a few anatomical traits)
  • Marker genes which are shared in the same cell types across species

The future:

  • Sequence all the active genes (mRNA) in all cell types across multiple species
  • Use the patterns of all active genes (the transcriptome)
slide91

Other areas of genomic medicine

Cancer genomicssequence the genome of cancer cells

  • Some variants are associated with high morbidity
  • A few genes are highly associated with increased risk of cancers e.g. BRCA1
  • Some variants may be associated with increased response to chemotherapy
  • However, apart from a few solid tumours, most cancer cells appear to harbour huge number of changes and rearrangements
  • It may be impossible to identify causative / facilitative / therapeutic candidates
slide92

Other areas of genomic medicine

Pharmaceutical genomics

  • Differential drug efficacyResponse to treatment varies within the populatione.g. 15% of breast cancers have copy number amplification of HER2 and are thus candidates for Herceptin
  • Differential side-effectse.g. 1 in 300 patients have lethal, hematopoietic adverse response to mercaptopurine for acute lymphoblastic leukemia, linked to mutations in thiopurine S-methyltransferase
  • Differential prognosisCarriers of CCR5 mutation either HIV 1 resistant or have much slower progression of AIDS
coming changes
Coming changes
  • Major reduction in cost: 100-1000x
  • Major increase in throughput$100k per mammalian genome to $1,000 per resequenced human
  • Large scale studies to gather phenotypic differences and associate with genomic variation
  • Most labs will start doing some sort of genomics
what is the future of genomics
What is the future of genomics
  • We will be awash with data and genomic variations
  • Which of these variations correspond to:

(a) disease causation?

(b) natural phenotypic differences?

Must use evolutionary signal.

For protein coding genes, that requires constructing family trees: orthology will continue to be central