Data base structures
1 / 30

Bioinformatics Applying the Concept of Information in Biology - PowerPoint PPT Presentation

Bioinformatics Applying the Concept of Information in Biology

Related searches for Bioinformatics Applying the Concept of Information in Biology

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Bioinformatics Applying the Concept of Information in Biology

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Slide2 l.jpg


Applying the Concept of Information in Biology

The theory of evolution is the conceptual framework of biology and medicine and bioinformatics is the tool used to analyze and quantify evolutionary relationships at every level of investigation – molecular, physiological, or ecological.

Slide3 l.jpg

Diagrammatic view of metabolic pathways showing major functional interaction among synthesis and degradation of nutrients from different food groups

(from KEGG;

Slide4 l.jpg

Macromolecular crowding in bacterial cytoplasm

Hopper & Mayer, 1999, Prokaryotes. Am.Sci. 87:518

Ellis, E.J., Macromolecular crowding, 2001, TIBS 26:597

Slide5 l.jpg

What makes a scientific discipline? A look at the history of biochemistry.

To know where we come from helps us understand where we are going. Novel ways of curing diseases and fighting off infections will include individualized prescription drug regiments, gene therapy, and the development of new generations of antibiotics. These changes are no less sweeping and broad than those brought to biology by chemists and physicists in the 1920s, 30s, and 40s attracted to a most obvious problem in biology at that time, the staggering lack of an atomistic understanding of genetics.

Slide6 l.jpg

1926, 1930 biochemistry.

First accounts that proteins convey enzymatic activity (urease, pepsin) in cellular metabolism; an important step to demonstrate that proteins catalyze chemical reactions and are not only structural components of cells


First successful X-ray study of the globular protein pepsin by Bernal and Crowfoot; it does not show high resolution details, but demonstrates water covered protein surface

Slide7 l.jpg

1937 biochemistry.

Citric Acid Cycle described by Hans Krebs; this is the central energy yielding pathway in all organisms; complete biochemical pathway reactions could be elucidated in the absence of any protein structure information (kinetic data represents macroscopic behavior of enzymes)

Slide8 l.jpg

1941 biochemistry.

'One gene, one enzyme' hypothesis by Beadle and Tatum


DNA is carrier of genetic information in bacteria (Oswald Avery)


First complete amino acid content of a protein is published (not its sequence, however)

Slide9 l.jpg

1951 biochemistry.

First complete amino acid sequence published of the protein hormone insulin by Fred Sanger

Proposed model for alpha helix and beta sheet and importance of so called hydrogen bonds in protein structures (Pauling and Corey)


DNA structure at atomic resolution by Crick, Watson, and Wilkins; they propose a model for DNA replication based on the structural information; the concept of structure-function relationship has been successfully used to solve a major problem in biology

Slide10 l.jpg

1962 biochemistry.

High resolution structure of myoglobin at 2 Angstrom confirms for the first time the existence of alpha helix structures in proteins (Perutz and Kendrew)

The structure of the enzyme Lysozyme with a bound inhibitor molecule solved at 2 Angstrom resolution giving the first structural insight into enzyme-substrate interaction and Koshland's induced fit theory


Genetic code solved; links DNA sequence to amino acid sequence in proteins (Holley, Khorana, Nirenberg)

Data base structures l.jpg
Data base structures biochemistry.

  • Sequences

  • Structures

  • Pathways

  • Analysis tools

  • Prediction tools

  • Functional categories & interactivity

  • PubMed

Slide12 l.jpg

Integrated database retrieval system, GenomeNet, Japan biochemistry.

Slide13 l.jpg

KEGG: Kyoto Encyclopedia of Genes and Genomes biochemistry.

Analysis prediction data mining l.jpg
Analysis, Prediction, biochemistry.Data Mining

  • Similarity searches

  • Structure prediction

  • Gene prediction

  • Pathway reconstruction

  • Visualization and Modeling

  • Pattern recognition

  • Clustering

  • Annotation

Slide16 l.jpg

Prediction of relationship among sequences biochemistry.

Cluster of Orthologous Groups at NCBI

Principal component analysis of variability found in whole genome databases

Slide17 l.jpg

Clusters of orthologous groups biochemistry.

(sequences of individual proteins or protein families represented in at least 3 species (currently microorganisms only) thus corresponding to an ancient conserved domain)




Slide18 l.jpg

Phylogenetic analyses indicate that R. prowazekii is more closely related to mitochondria than is any other microbe studied so far.


Rickettsia prowazekii

Obligate intracellular parasite, the causative agent of epidemic typhus. The functional profiles of these genes show similarities to those of mitochondrial genes: no genes required for anaerobic glycolysis are found in either R. prowazekii or mitochondrial genomes, but a complete set of genes encoding components of the tricarboxylic acid cycle and the respiratory-chain complex is found in R. prowazekii. In effect, ATP production in Rickettsia is the same as that in mitochondria. Many genes involved in the biosynthesis and regulation of biosynthesis of amino acids and nucleosides in free-living bacteria are absent from R. prowazekii and mitochondria. Such genes seem to have been replaced by homologues in the nuclear (host) genome. (Nature 1998 Nov 12;396(6707):133-40)

Slide19 l.jpg

KEGG – pathway maps closely related to mitochondria than is any other microbe studied so far.

Slide20 l.jpg

Glycolysis pathway map from KEGG closely related to mitochondria than is any other microbe studied so far.

Escherichia coli

K-12 MG1655

Rickettsia prowazekii

Slide21 l.jpg

Rickettsia prowazekii closely related to mitochondria than is any other microbe studied so far.

Slide22 l.jpg

Escherichia coli closely related to mitochondria than is any other microbe studied so far.

Slide23 l.jpg

Homo sapiens closely related to mitochondria than is any other microbe studied so far.

Slide24 l.jpg

What kind of information can be obtained using the COG database?

1. Annotation of proteins. Known functions (and two- or three-dimensional structures) of one COG member can often be directly attributed to the other members of the COG. Caution must be used here, however, since some COGs contain paralogs whose function may not precisely correspond to that of the known protein.

2. Phylogenetic patterns. These show the presence or absence of proteins from a given organism in a specific COG. Used systematically, such patterns can be used to identify whether a particular metabolic pathway exists in an organism.

3. Multiple alignments. Each COG page includes a link to a multiple alignment of COG members, which can be used to identify conserved sequence residues and analyze evolutionary relationships between member proteins.

Slide25 l.jpg

Hierarchical cluster analysis of DNA microarrays database?

Eisen et al. (1998) PNAS 95:14863.;

Slide26 l.jpg

Hierarchical clustering and factor analysis of DNA microarrays

Factor analysis (and principal component analysis) demonstrates three independent factors (Eigenvectors) accounting for 99.5% of the variability of the array data (6 arrays; three conditions; each condition repeated once). Factor one (F1) accounts for the variability in hybridization strength. Factor two accounts for gene specific differences of hybridization strength that are more distinguish Va2 from both Vb5 and control (see diagram F2-F1). Factor three shows that there are general condition specific differences that distinguish control from Vb5 and from Va2 but are highly reproducible when repeated by labeling cDNA from same RNA samples. The dendrogram obtained from hierarchical clustering of the six arrays shows the same relationship as determined by the second variable (F2) from factor analysis.

Slide27 l.jpg

One can ask any biologically interesting question concerning relationship between database entries, e.g.:

How many genes in the human genome?

Minimal gene set theory!

Evolutionary psychology: Explaining behavioral traits.

Slide28 l.jpg

Minimal gene set theory! relationship between database entries, e.g.:

The definition of a minimal gene set would be that any knock-out that does not kill the organism, proves that there are more genes than the organism needs for survival. Therefore, a minimal gene set would be one where each single gene knock-out would result in a non-viable clone.

The smallest gene set (besides large viral genomes with >200 genes) found is 467 in Mycoplasma genitalium. The latter can hardly be considered a free living organism.

Autonomous (neither symbiotic nor parasitic) species to not tend to have minimal gene sets Chemotrophs, for which there are only archaea known, have genomes with usually more than 2,000 open reading frames, up to four times the minimal gene set found in eubacteria.

Phototrophs produce, besides bacterial species, some of the largest life forms (trees) containing some of the largest genomes.

Slide29 l.jpg

Genome “size” (number of proteins) of some microorganisms

Name Proteins in COGs

Methanococcus jannaschii 1786 1330

Methanobac. thermoautotrophicum 1873 1388

Saccharomyces cerevisiae 5955 2290

Escherichia coli K12 4275 3414

Escherichia coli O157 5315 3662

Helicobacter pylori 1576 1096

Rickettsia prowazekii 835 697

Mycoplasma pneumoniae 689 425

Mycoplasma genitalium 484 381

Slide30 l.jpg

How many genes in the human genome? microorganisms

Bets: 165

Mean: 61,710

Lowest: 27,462

Highest: 153,478

Assessment of the gene number will occur on the 2003 Cold Spring Harbor Laboratory Genome meeting

Source: Sanger Institute

  • Login