Genome Analysis II Comparative Genomics. Jiangbo Miao Apr. 25, 2002 CISC889-02S: Bioinformatics. Why Comparative Genomics ?. It tells us what are common and what are unique between different species at the genome level.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Apr. 25, 2002
– e.g., to distinguish orthologs from paralogs
(BLAST, WU-BLAST or FASTA)
: A conservative cutoff E value : 10e-6
This comparison distinguishes unique proteins from proteins arisen from gene duplication, and also reveals the # of gene families.
Significantly matched pairs of protein sequences may be paralogs.
(# of amino acid changes between the aligned seq.)
: This algorithm favors the selection of proteins with the same domain structure reflecting that these proteins are most probably paralogs
(the same procedure as that used to make a phylogenetic tree)
* In Hemophilus, 1247 out of 1709 proteins do not have paralogs
* Core proteome of the multicellular organisms is only twice that of yeast
(Y and Z are paralogs in B, X and Z are orthologs)
* The sequences also align to 80%, so they represent highly conserved sets of genes
In the above database search, A protein seq will not only match the orthologous seq in the second proteome, but also those paralogous seqs of the orthologous seq.
To identify all matching proteins as an orthologous group related by both speciation (ortholog) and gene duplication (paralog) events.
COGs usually correspond to classes of metabolic function
EST seqs are usually short( the equivalent of 100-150 amino acids)
In some phylogenetically diverse groups of organisms, there are conserved proteins or protein domains that have been conserved over long periods of evolutionary time.
e.g. 70% prokaryotic genomes contain ACRs
the acquisition of genetic material from a different organism and these transferred material then becomes a permanent addition to the recipient
(HT is a significant source of genome variation for bacteria)
90% of E. coli genes fell into these same broad categories
Human chromosomes were cut into > 100 pieces and reassembled into a reasonable facsimile of the mouse chromosome.
Assume that those rearrangements have occurred by some transposition or recombination events
And identify the rearrangements by “undoing” those events.
The goal is to minimum the number of rearrangements, which represents a genetic distance between the two genome sequences
[Overbeek et al. (1999)]
: 40% of these pairs with higher score correspond to proteins that are known to act in a common metabolic pathway.
A significant proportion of the pairs of PCBBH correspond to genes that have a related function and lie on the same pathway.