Orthology predictions for whole mammalian genomes. Leo Goodstadt MRC Functional Genomics Unit Oxford University. Mammalian Genomes. How does our genome, and how do our genes, differ from those of other mammals and other vertebrates?. Great Expectations.
MRC Functional Genomics Unit
How does our genome, and how do our genes, differ from those of other mammals and other vertebrates?
We did not appreciate how hard it would be to ‘read off’ functions from the human genome.
We had no idea that individual human genomes can differ so much!So why is it taking so long to understand a simple genome
10 – 20% of genes are lineage specific depending on comparisons
Family trees for genes:
Human specific genes
missing from mouse.(In many cases, more distantly related mouse gene (homologues) can be found. (8%)
1 to 1
Gene families shared with mouse but which have expanded in human (9%)
(present as a single gene
in the common ancestor
to human and mouse)
M. Lynch and A. Force , The probability of duplicate gene preservation by subfunctionalisation. Genetics 154 (2000), pp. 459–473
Chemosensation(OR, V1R and V2R )
Reproduction(Vomeronasal Receptors, lipocalins, b-microseminoprotein (12:1))
Immunity(IG chains, butyrophilins, leukocyte IG-like receptors, T-cell receptor chains and carcinoembryonic antigen-related cell adhesion molecules )pancreatic RNAses
Detoxification(hypoxanthine phosphoribosyltransferase homologues nitrogen poor diets)
KRAB Zn Fingers
Cancer-testis antigen genes (e.g. PRAMEs)
Regulate chromatin structure and therefore the timing of transcription.Rapidly-changing developmental or transcriptional regulatory genes?
Correlations with known annotations
LeoGoodstadt et al. Genome Res. 2007; 17: 969-981
Human - Chicken
Chicken - Human
Amino acid sites under positive selection in human (red), mouse (blue) and rat (purple) [or multiple species (yellow)] PRAME genes.
Androgen-binding proteins. produced by sertoli cells in testes seminiferous tubulesEmes et al. (2004) Genome Res. 14(8):1516-29
sites subject to
sites: dark blue, ligand (glutamate) pink
peptide ligand in MHC
structure in green
Hs normal: MAETLFWTPLLVVLLAGLGDTEAQQTTLHPLVGRVFVHTLDHETFLSLPEHVAVPPAVHI
Hs variant: MAETLFWTPLLVVLLAGLGDTEAQQTTLHLLVGRVFVHTLDHETFLSLPEHVAVPPAVHI
Mm normal: MAAAVTWIPLLAGLLAGLRDTKAQQTTLHLLVGRVFVHPLEHATFLRLPEHVAVPPTVRL
Nick Dickens & Jörg Schultz
7293 SwissProt disease-associated variants
Amino acid sequence identity
Pairwise alignment coverage
Number of exons
Sequence length (codons)
Unspliced transcript length (bp)
G+C content at 4D sites
Galtier, N. et al. Genetics 2001;159:907-911
Biased Gene Conversion
In subtelomeres and X chromosomes
X chromosome / Subtelomeric regions are:
Mouse has few recent duplications than rat?
Number of Nodes
KS distance from current time
Differences in the number of copies of a gene
Copy Number / Structural Variation
Tuzun et al. Nature Genetics 2005
KAESSMANN, H. & PÄÄBO, S. The genetical history of humans and the great apes.Journal of Internal Medicine 251 (1), 1-18.
Look at annotations / evolutionary history (between species and in the population)of corresponding genes (Orthology!)
Over- /Under- representations of
The Wellcome Trust Case Control Consortium:
7 x 2000 samples per disease
Chromosome 9 (Mb)
Cancer genomicssequence the genome of cancer cells
(a) disease causation?
(b) natural phenotypic differences?
Must use evolutionary signal.
For protein coding genes, that requires constructing family trees: orthology will continue to be central