220 likes | 247 Views
Explore the evolutionary relationships and functional similarities among model organisms like fruit flies, worms, and yeast in the context of genomic comparisons. Learn about gene families, orthologs, and conserved domains through methods like the BLAST algorithm. Discover how these findings shed light on the genetic connections between distant species.
 
                
                E N D
Ishay Ben-Zion Comparative Genomicsof the Eukaryotes A paper by : Rubin, Yandell, Wortman,…
Motivation • Evolution – Charles Darwin (1838) Similarity between different species Model organisms • A human shares 50% of his genes with a banana. How ? • Humans and bananas are multi-cellular • Other Similarities Humans share 23% of their genes with Yeast • Could banana be a good model organism ?
Important requirements: Size Generation Time (for genetic research) Manipulation (genetic and not) Little “Junk DNA” (easy for sequencing) Money Model Organisms Heavily Studied – used as examples for other species Once it is studied enough – It is a good candidate
This paper describes: A comparison between the genomes of 3 Eukaryotes: Eukaryote – Cell has inner structures with membranes (nucleus) 1) A fruit fly - Drosophila melanogaster 2) A worm – C. elegans 3) Yeast – S. cerevisiae Other model organisms (E. coli, mouse, Zebrafish, Arabidopsis)
Taxonomic classification Cellular life Bacteria Archaea Eukaryota Domain: Kingdom: Animalia Protista Plantae Fungi H. influenzae Fly worm yeast Species:
Drosophila melanogaster • Popular model organism (for developmental biology) • A trial for the human genome (sequenced at 2000) • Easily induce mutations
Caenorhabditis elegans • Transparent, 1-mm long • Simple – 959 cells (300 neurons) • Eat, sleep & have sex (or self-fertilize) • Hermaphrodites – 99.95%, Males – 0.05%
Caenorhabditis elegans Good as a model organism for: • Genetics: First multi-cellular sequenced genome • Developmental biology: cell fate mapping • Neurobiology: neurons connectivity map
Saccharomyces cerevisiae • Also called Baker’s yeast • Single-celled • Diameter: 5-10 μ • Popular model organism • Simplest Eukaryote • First Eukaryotic sequenced genome
The 1st comparison • Instead of counting genes - count gene families • What are gene families ? Paralogs = highly similar proteins in the same genome Similar functionality – but not always • Remark: proteins = genes Sets of paralogs
Findings • Size of a family: one or more • No. of families – not a good measure for complexity
The 2nd comparison • Pool genes of large families of 3 species: • For each protein – search for orthologs • Orthologs = Similar proteins in other species • Among families found in flies and worms (but not yeast): Responsible for multi-cellular development • Among families found only in flies: Responsible for immune response and fly specific
A C G C T C G C A A C T A C G C T T G C Methods – BLAST algorithm • Basic Local Alignment Search Tool • For comparing biological sequences (to find Homology) Example: Proteins, DNA sequences Query Library of sequences (In the library – sequences of different lengths) • In the paper: Paralogy, Orthology - kinds of Homology
BLAST – Step 1 • Separate query to k-letter words Example: Proteins – Letters are Amino acids (L=Leucine) Query sequence: RPPQGLF (k=3) 3-letter words: RPP PPQ PQG QGL GLF
BLAST – Step 2 • Take one k-letter word – PQG • Search library for similar words – LGMCPQA, DPPEGVV • Define similarity: High score for 2 words Have common ancestor PQG – PQA : 12 PQG – PEG : 15 • Save similar words above a threshold T (save positions) • Repeat for all k-letter words in query Use scoring matrix for two k-letter words
BLAST – Step 3 • Align at saved positions: - - - R P P Q G L F - - - - - - D P P E G V V - - - Scores: -2 7 7 2 6 1 -1 • Extend match right and left for positive score • New pairs are called High-scoring Segment Pairs (HSP) • Save significant HSPs (above a threshold S) Total: 15 + 7 + 1 = 23
BLAST – Step 4 • Align saved HSPs (with gaps) Example: 2 Sequences with 2 HSPs Insert gap • Compute total score (involves gap penalties) • Report all matches above a threshold E
BLAST – Whole process Separate query to k-letter words Search library for similar k-letter words and save Extend to HSPs and save Align whole sequences and compute total score Return sequences with score above E These are homologous to query
The 3rd comparison • Compare all genes of three species with length limitation (80% of length) • 20% of the fly appear in worm and yeast They perform functions common to all eukaryotic cells
The 4th comparison • Compare all genes of three species to mammalian sequences (without length limitation) • 50% of the fly proteins appear in mammals • 36% of the worm proteinsappear in mammals Fly is closer to mammals • Most of mammalian sequences used here were short The similarities reflect conserved domains
What are conserved domains ? • Domains – independent parts that construct proteins • Appear in different combinations in different proteins • Similarity to short sequences Conserved domains Closeness in evolution ABC ADEG
To conclude Significant similarity between genomes of ”distant” species (Man – Yeast 23%) Similarity increases for taxonomically close species ( ) No. of genes or gene families – bad measure for complexity Why ? More information that is not encoded in the genome (Protein interactions – e.g. physical proximity of genes) How to define complexity ?