slide1 l.
Skip this Video
Loading SlideShow in 5 Seconds..
Bioinformatics Applying the Concept of Information in Biology PowerPoint Presentation
Download Presentation
Bioinformatics Applying the Concept of Information in Biology

Loading in 2 Seconds...

play fullscreen
1 / 30

Bioinformatics Applying the Concept of Information in Biology - PowerPoint PPT Presentation

  • Uploaded on

Bioinformatics Applying the Concept of Information in Biology

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Bioinformatics Applying the Concept of Information in Biology' - daniel_millan

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


Applying the Concept of Information in Biology

The theory of evolution is the conceptual framework of biology and medicine and bioinformatics is the tool used to analyze and quantify evolutionary relationships at every level of investigation – molecular, physiological, or ecological.


Diagrammatic view of metabolic pathways showing major functional interaction among synthesis and degradation of nutrients from different food groups

(from KEGG;


Macromolecular crowding in bacterial cytoplasm

Hopper & Mayer, 1999, Prokaryotes. Am.Sci. 87:518

Ellis, E.J., Macromolecular crowding, 2001, TIBS 26:597


What makes a scientific discipline? A look at the history of biochemistry.

To know where we come from helps us understand where we are going. Novel ways of curing diseases and fighting off infections will include individualized prescription drug regiments, gene therapy, and the development of new generations of antibiotics. These changes are no less sweeping and broad than those brought to biology by chemists and physicists in the 1920s, 30s, and 40s attracted to a most obvious problem in biology at that time, the staggering lack of an atomistic understanding of genetics.


1926, 1930

First accounts that proteins convey enzymatic activity (urease, pepsin) in cellular metabolism; an important step to demonstrate that proteins catalyze chemical reactions and are not only structural components of cells


First successful X-ray study of the globular protein pepsin by Bernal and Crowfoot; it does not show high resolution details, but demonstrates water covered protein surface



Citric Acid Cycle described by Hans Krebs; this is the central energy yielding pathway in all organisms; complete biochemical pathway reactions could be elucidated in the absence of any protein structure information (kinetic data represents macroscopic behavior of enzymes)



'One gene, one enzyme' hypothesis by Beadle and Tatum


DNA is carrier of genetic information in bacteria (Oswald Avery)


First complete amino acid content of a protein is published (not its sequence, however)



First complete amino acid sequence published of the protein hormone insulin by Fred Sanger

Proposed model for alpha helix and beta sheet and importance of so called hydrogen bonds in protein structures (Pauling and Corey)


DNA structure at atomic resolution by Crick, Watson, and Wilkins; they propose a model for DNA replication based on the structural information; the concept of structure-function relationship has been successfully used to solve a major problem in biology



High resolution structure of myoglobin at 2 Angstrom confirms for the first time the existence of alpha helix structures in proteins (Perutz and Kendrew)

The structure of the enzyme Lysozyme with a bound inhibitor molecule solved at 2 Angstrom resolution giving the first structural insight into enzyme-substrate interaction and Koshland's induced fit theory


Genetic code solved; links DNA sequence to amino acid sequence in proteins (Holley, Khorana, Nirenberg)

data base structures
Data base structures
  • Sequences
  • Structures
  • Pathways
  • Analysis tools
  • Prediction tools
  • Functional categories & interactivity
  • PubMed

Integrated database retrieval system, GenomeNet, Japan


KEGG: Kyoto Encyclopedia of Genes and Genomes

analysis prediction data mining
Analysis, Prediction, Data Mining
  • Similarity searches
  • Structure prediction
  • Gene prediction
  • Pathway reconstruction
  • Visualization and Modeling
  • Pattern recognition
  • Clustering
  • Annotation

Prediction of relationship among sequences

Cluster of Orthologous Groups at NCBI

Principal component analysis of variability found in whole genome databases


Clusters of orthologous groups

(sequences of individual proteins or protein families represented in at least 3 species (currently microorganisms only) thus corresponding to an ancient conserved domain)





Phylogenetic analyses indicate that R. prowazekii is more closely related to mitochondria than is any other microbe studied so far.


Rickettsia prowazekii

Obligate intracellular parasite, the causative agent of epidemic typhus. The functional profiles of these genes show similarities to those of mitochondrial genes: no genes required for anaerobic glycolysis are found in either R. prowazekii or mitochondrial genomes, but a complete set of genes encoding components of the tricarboxylic acid cycle and the respiratory-chain complex is found in R. prowazekii. In effect, ATP production in Rickettsia is the same as that in mitochondria. Many genes involved in the biosynthesis and regulation of biosynthesis of amino acids and nucleosides in free-living bacteria are absent from R. prowazekii and mitochondria. Such genes seem to have been replaced by homologues in the nuclear (host) genome. (Nature 1998 Nov 12;396(6707):133-40)


Glycolysis pathway map from KEGG

Escherichia coli

K-12 MG1655

Rickettsia prowazekii


What kind of information can be obtained using the COG database?

1. Annotation of proteins. Known functions (and two- or three-dimensional structures) of one COG member can often be directly attributed to the other members of the COG. Caution must be used here, however, since some COGs contain paralogs whose function may not precisely correspond to that of the known protein.

2. Phylogenetic patterns. These show the presence or absence of proteins from a given organism in a specific COG. Used systematically, such patterns can be used to identify whether a particular metabolic pathway exists in an organism.

3. Multiple alignments. Each COG page includes a link to a multiple alignment of COG members, which can be used to identify conserved sequence residues and analyze evolutionary relationships between member proteins.


Hierarchical cluster analysis of DNA microarrays

Eisen et al. (1998) PNAS 95:14863.;


Hierarchical clustering and factor analysis of DNA microarrays

Factor analysis (and principal component analysis) demonstrates three independent factors (Eigenvectors) accounting for 99.5% of the variability of the array data (6 arrays; three conditions; each condition repeated once). Factor one (F1) accounts for the variability in hybridization strength. Factor two accounts for gene specific differences of hybridization strength that are more distinguish Va2 from both Vb5 and control (see diagram F2-F1). Factor three shows that there are general condition specific differences that distinguish control from Vb5 and from Va2 but are highly reproducible when repeated by labeling cDNA from same RNA samples. The dendrogram obtained from hierarchical clustering of the six arrays shows the same relationship as determined by the second variable (F2) from factor analysis.


One can ask any biologically interesting question concerning relationship between database entries, e.g.:

How many genes in the human genome?

Minimal gene set theory!

Evolutionary psychology: Explaining behavioral traits.


Minimal gene set theory!

The definition of a minimal gene set would be that any knock-out that does not kill the organism, proves that there are more genes than the organism needs for survival. Therefore, a minimal gene set would be one where each single gene knock-out would result in a non-viable clone.

The smallest gene set (besides large viral genomes with >200 genes) found is 467 in Mycoplasma genitalium. The latter can hardly be considered a free living organism.

Autonomous (neither symbiotic nor parasitic) species to not tend to have minimal gene sets Chemotrophs, for which there are only archaea known, have genomes with usually more than 2,000 open reading frames, up to four times the minimal gene set found in eubacteria.

Phototrophs produce, besides bacterial species, some of the largest life forms (trees) containing some of the largest genomes.


Genome “size” (number of proteins) of some microorganisms

Name Proteins in COGs

Methanococcus jannaschii 1786 1330

Methanobac. thermoautotrophicum 1873 1388

Saccharomyces cerevisiae 5955 2290

Escherichia coli K12 4275 3414

Escherichia coli O157 5315 3662

Helicobacter pylori 1576 1096

Rickettsia prowazekii 835 697

Mycoplasma pneumoniae 689 425

Mycoplasma genitalium 484 381


How many genes in the human genome?

Bets: 165

Mean: 61,710

Lowest: 27,462

Highest: 153,478

Assessment of the gene number will occur on the 2003 Cold Spring Harbor Laboratory Genome meeting

Source: Sanger Institute