Basic Bioinformatics and Gene Effects on Growth Rate in Bacteria

Basic Bioinformatics and Gene Effects on Growth Rate in Bacteria Dr. Md. Aminul Hoque Date: 24-03-2012 E-mail: mdaminulh@gmail.com

Outlines • Bioinformatics Definition • Short History of Bioinformatics • Branch of Bioinformatics • Bioinformatics in Research World • Our Position and Level • Bacteria Culture • Bacterial growth under natural Condition • Effect of Gene on Bacterial Growth • Dynamic nature of metabolites • Metabolic Flux and Metabolic Control Analysis • SimulationResults • Referenceｓ

What is Bioinformatics? The Statistical, Mathematical, and Computing methods that aim to solve biological problems using DNA andamino acid sequencesand related information. `The marriage between computer science and molecular biology`

Biology (Bio)chemistry Computer Science Statistics Bioinformatics !!! Bioinformatics

Goal of Bioinformatics The primary goal of bioinformatics is to increase the understanding of biological processes. Major research efforts in the field include sequence alignment, gene findings, genome assembly, drug design, drug discovery, protein structure prediction, protein structure alignment, gene expression, protein-protein interaction, genome-wide association studies, data mining ,machine learning, algorithms, visualization and the modeling of Evolution.

Our Goal! To Familiar Bioinformatics to the audience, researchers (From other discipline), students and interested group. And organize people to make a Bioinformatics roadmap for Bangladesh

21ST centaury Genome Transcriptome Proteome Central Dogma in Molecular Biology Gene (DNA) mRNA Protein

From DNA to Genome First protein sequence Watson and Crick DNA model 1955 1960 First protein structure 1965 1970 1975 1980 1985

1990 First bacterial genome Hemophilus Influenzae 1995 Yeast genome First human genome draft 2001

Turning Point of Bioinformatics * Pre Genome Sequence Era * Post Genome Sequence Era The Haemophilus influenzeagenome (1.8 Mb) is sequenced 1995 that was the first free living organism to have its genome completely sequenced. In 1996, Saccharomyces cerevisiae (Beaker `s yeast) was the first eukaryote genome sequence released. In 1998 the first genome sequence for a multicellular eukaryote, Caenorhabditis elegans is sequenced. Source: ttp://en.wikipedia.org/wiki/List_of_sequenced_eukaryotic_genomes

The “post-genomics” era Genomes sequenced, What’s Next ? Comparative genomics Structural genomics Functional genomics Annotation Goal: to understand the living cell

Identify the genes within a given sequence of DNA Identify the sites Which regulate the gene Annotation Predict the function

How do we identify a gene in a genome? A gene is characterized by several features (promoter, ORF…) some are easier and some harder to detect…

Using Bioinformatics approaches for Gene hunting Relative easy in simple organisms (e.g. bacteria) Very hard for higher organism (e.g. humans)

Comparative genomics

Perhaps not surprising!!! How humans are chimps? Comparison between the full drafts of the human and chimp genomes revealed that they differ only by 1.23%

So where are we different ?? Human ATAGCGGGGGGATGCGGGCCCTATACCC Chimp ATAGGGG - - GGATGCGGGCCCTATACCC Mouse ATAGCG - - - GGATGCGGCGC -TATACCA

Functional genomics

TO BE IN NOT ENOUGH In any time point a gene can be functional or not

Structural Genomics

protein complexes Evolutionary relationship fold Biologic processes Protein-ligand complexes Shape and electrostatics Active sites Functional sites The protein three dimensional structure can tell much more then the sequence alone

DNA Sequencing • Bioinformatics is based on the fact that DNA sequencing is cheap, and becoming easier and cheaper very quickly. • the Human Genome Project cost roughly $3 billion and took 12 years (1991-2006). • Sequencing James Watson’s genome in 2007 cost $2 million and took 2 months • Today, we could get our genome sequenced for about $100,000 and it would take a month. • The Archon X prize: you win $10 million if you can sequence 100 human genomes in 10 days, at a cost of $10,000 per genome. • It is realistic to envision $100 per genome within 10 years: everyone’s genome could be sequenced if they wanted or needed it.

Why it’s useful! • All of the information needed to build an organism is contained in its DNA. If we could understand it, we would know how life works. • Preventing and curing diseases like cancer (which is caused by mutations in DNA) and inherited diseases. • Curing infectious diseases (everything from AIDS and malaria to the common cold). If we understand how a microorganism works, we can figure out how to block it. • Understanding genetic and evolutionary relationships between species • Understanding genetic relationships between humans. Projects exist to understand human genetic diversity. Also, sequencing the Neanderthal genome. • Ancient DNA: currently it is thought that under ideal conditions (continuously kept frozen), there is a limit of about 1 million years for DNA survival. So, Jurassic Park will probably remain fiction.

From DNA to Gene • But: extracting that information is difficult. How to convert a string of ACGT’s into knowledge of how the organism works is hard. • Most of the work is on the computer, with key confirming experiments done in the “wet lab”. • The sequence below contains a gene critical for life: the gene that initiates replication of the DNA. Can you spot it? • We are now going to spend some time on what genes look like and how we can find them. TTGGAAAACATTCATGATTTATGGGATAGAGCTTTAGATCAAATTGAAAAAAAATTAAGCAAACCTAGTTTTGAAACCTG GCTCAAATCGACAAAAGCTCATGCTTTACAAGGAGACACGCTCATTATTACTGCACCTAATGATTTTGCACGGGACTGGT TAGAATCTAGGTATTCTAATTTAATTGCTGAAACACTTTATGATCTTACGGGGGAAGAGTTAGATGTAAAATTTATTATT CCTCCTAACCAGGCCGAGGAAGAATTCGATATTCAAACTCCTAAAAAGAAAGTCAATAAAGACGAAGGAGCAGAATTTCC TCAAAGCATGCTAAATTCGAAGTATACCTTTGATACATTTGTTATCGGATCTGGAAATCGGTTTGCGCATGCAGCTTCTT TAGCAGTAGCAGAAGCGCCGGCTAAAGCGTATAATCCGCTTTTTATTTACGGGGGAGTAGGATTAGGCAAAACACACTTA ATGCACGCCATAGGCCACTATGTGTTAGATCATAATCCTGCCGCGAAAGTCGTGTACTTATCATCTGAAAAATTCACAAA CGAGTTTATTAACTCTATTCGTGACAATAAAGCAGTAGAATTCCGCAACAAATACCGTAATGTAGATGTTTTACTGATTG ATGATATTCAATTCTTAGCAGGTAAAGAGCAGACACAAGAAGAATTTTTCCATACGTTTAATACGCTTCACGAAGAAAGC AAGCAGATTGTCATCTCAAGTGATCGACCGCCGAAAGAAATTCCTACACTTGAAGATCGACTTCGCTCTCGCTTTGAATG GGGCCTTATTACAGACATCACACCACCAGATTTGGAAACACGAATTGCTATTTTGCGTAAAAAAGCCAAAGCGGACGGCT TAGTTATTCCAAATGAAGTTATGCTTTATATCGCCAATCAGATTGATTCAAATATTAGAGAATTAGAAGGCGCACTTATT

Chromosomes and Genes • each chromosome is a long piece of DNA • B. megaterium genome is a circle (like most bacteria) of about 5 million bases. • Human chromosomes are 100-200 million bases long. We have 46 chromosomes (2 sets of 23, one set from each parent). • genes are just regions on that DNA. It is not obvious where genes are if you look at a DNA sequence. • there is a lot of DNA that is not part of genes: in humans only 2% at most of the DNA is part of any gene. • Bacteria use more of their DNA: 80% of the B. meg chromosome is genes. • B. meg has about 1 gene per 1000 base pairs (bp) of DNA. About 5000 genes • Humans have about 25,000 genes. • We are far more complicated than bacteria: regulation of the genes is very complicated in humans • We use the same gene in different ways in different tissues

The Genetic Code • Proteins are long chains of amino acids. • There are 20 different amino acids coded in DNA • There are only 4 DNA bases, so you need 3 DNA bases to code for the 20 amino acids • 4 x 4 x 4 = 64 possible 3 base combinations (codons) • Each codon codes for one amino acid • Most amino acids have more than one possible codon • Genes start at a start codon and end at a stop codon. • 3 codons are stop codons: all genes end at a stop codon. • Start codons are a bit trickier, since they are used in the middle of genes as well as at the beginning • in eukaryotes, ATG is always the start codon, making Methionine (Met) the first amino acid in all proteins (but in many proteins it is immediately removed). • In prokaryotes, ATG, GTG, or TTG can be used as a start codon. B. meg prefers ATG, but about 30% of the genes start with GTG or TTG. In bioinformatics, we generally ignore the fact that RNA uses the base uracil (U) in place of T.

Bioinformatics Tools • * Using OMIM (Online Mendelian Inheritance in Man) to learn about genes and genetic disease • Finding a gene on a chromosome map using NCBI Map Viewer • Accessing records in NCBI's sequence databases • Sequence-similarity searching using NCBI BLAST - Covers obtaining sequence data in FASTA format, submitting sequence data as a query, and understanding BLAST results • Examining protein structures from the Protein Data Bank -

Genome Browsers • UCSC Genome Browserhttp://genome.ucsc.edu/ • Ensembl Genome Browser(http://www.ensembl.org) • WormBase:http://www.wormbase.org/ • AceDB:http://www.acedb.org/ • Comprehensive Microbial Resource:http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl • FlyBase:http://flybase.bio.indiana.edu/

Bioinformatics Deals With • There are two fundamental ways of modeling a Biological system (e.g., living cell) both coming under Bioinformatics approaches. • Static • Sequences – Proteins, Nucleic acids and Peptides • Structures – Proteins, Nucleic acids, Ligands (including metabolites and drugs) and Peptides • Interaction data among the above entities including microarray data and Networks of proteins, metabolites • Dynamic • Systems Biology comes under this category including reaction fluxes and dynamic nature of metabolite concentrations • Multi-Agent Based modeling approaches capturing cellular events such as signaling, transcription and reaction dynamics • A broad sub-category under bioinformatics is structural bioinformatics.

Profile of E.coli Escherichia coli (E. coli) are members of a large group of bacterial germs that inhabit the intestinal tract of humans and other warm-blooded animals (mammals, birds). Newborns have a sterile alimentary tract, which within two days becomes colonized with E. coli.

Central Carbon Metabolism of Wild type Escherichia coli NADP+ NADPH NADPH NADP+ Pts CO2 RU5P G6P 6PG Glucose G6PDH 6PGDH PEP PYR Pgi Rpi Rpe X5P F6P R5P ATP Ald TkA TkB ADP DHAP GAP E4P S7P ADP NAD+ Tal Eno NADH ATP ADP ATP PYR Pyruvate (Ext.) PEP Pyk CO2 NAD+ CO2 PDH NADH Ppc NADH, CO2 AcCoA Acetate (Ext.) PckA Mez sfc NAD(P)H OAA CS MDH ICT NAD(P) NADP+ MAL GLYOXY ICDH NADPH CO2 FADH SDH AKG SUC AKGDH NADH CO2

Central Carbon Metabolism of Wild type Escherichia coli NADP+ NADPH NADPH NADP+ Pts CO2 RU5P G6P 6PG Glucose G6PDH 6PGDH PEP PYR Pgi Rpi Rpe X5P F6P R5P ATP Ald TkA TkB ADP EMP DHAP GAP E4P S7P ADP NAD+ Tal Eno NADH ATP ADP ATP PYR Pyruvate (Ext.) PEP Pyk CO2 NAD+ CO2 PDH NADH Ppc NADH, CO2 AcCoA Acetate (Ext.) PckA Mez sfc NAD(P)H OAA CS MDH ICT NAD(P) NADP+ MAL GLYOXY ICDH NADPH CO2 FADH SDH AKG SUC AKGDH NADH CO2

Central Carbon Metabolism of Wild type Escherichia coli NADP+ NADPH NADPH NADP+ Pts CO2 RU5P G6P 6PG Glucose G6PDH 6PGDH PPP PEP PYR Pgi Rpi Rpe X5P F6P R5P ATP Ald TkA TkB ADP DHAP GAP E4P S7P ADP NAD+ Tal Eno NADH ATP ADP ATP PYR Pyruvate (Ext.) PEP Pyk CO2 NAD+ CO2 PDH NADH Ppc NADH, CO2 AcCoA Acetate (Ext.) PckA Mez sfc NAD(P)H OAA CS MDH ICT NAD(P) NADP+ MAL GLYOXY ICDH NADPH CO2 FADH SDH AKG SUC AKGDH NADH CO2

Central Carbon Metabolism of Wild type Escherichia coli NADP+ NADPH NADPH NADP+ Pts CO2 RU5P G6P 6PG Glucose G6PDH 6PGDH PEP PYR Pgi Rpi Rpe X5P F6P R5P ATP Ald TkA TkB ADP DHAP GAP E4P S7P ADP NAD+ Tal Eno NADH ATP ADP ATP PYR Pyruvate (Ext.) PEP Pyk CO2 NAD+ CO2 PDH NADH Ppc NADH, CO2 AcCoA Acetate (Ext.) PckA Mez sfc NAD(P)H OAA CS MDH ICT NAD(P) NADP+ MAL GLYOXY ICDH TCA NADPH CO2 FADH SDH AKG SUC AKGDH NADH CO2

Central Carbon Metabolism of Pgi mutant Escherichia coli NADP+ NADPH NADPH NADP+ Pts CO2 RU5P G6P 6PG Glucose G6PDH 6PGDH Edd PEP PYR Pgi Rpi Rpe X5P F6P R5P ED ATP KDPG Ald TkA TkB ADP DHAP GAP E4P S7P Eda ADP NAD+ Tal Eno NADH ATP ADP ATP PYR Pyruvate (Ext.) PEP Pyk CO2 NAD+ CO2 PDH NADH Ppc NADH, CO2 AcCoA Acetate (Ext.) PckA Mez sfc NAD(P)H OAA CS MDH ICT NAD(P) NADP+ MAL GLYOXY ICDH NADPH CO2 FADH SDH AKG SUC AKGDH NADH CO2

Target and Pathway Methodology Theoretical Experimental MFA & MCA Dynamic Simulation Batch Culture Continuous Culture Sampling Steady State Substrate Pulse OD OUR,CER Extracellular Metabolite Conc. Rapid Sampling Extraction Quenching Analysis, Intracellular metabolites

Experimental Work E. coli Culture • Batch Cultures • Feed Batch Culture • Continuous Cultures Conditions: Strains:Escherichia coli BW25113 pykA, zwf and pgi gene knockout of BW25113 strains Carbon Source: glucose, acetate or pyruvate (4g/l) Substrate Limited: NH3 limited

Bacterial Growth: Batch Culture (h) 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Factors Affecting Microbial Growth • * Microorganism • Nutrients • Temperature • pH • Oxygen • Water availability

Batch Culture A large-scale closed system culture in which cells are grown in a fixed volume of nutrient culture medium under specific environmental conditions (e.g. nutrient type, temperature, pressure, aeration, etc.) up to a certain density in a tank or airlift fermentor, harvested and processed as a batch, especially before all nutrients are used up.

Fed-Batch Culture A fed-batch is a biotechnological batch process which is based on feeding of a growth limiting nutrient substrate to a culture. The fed-batch strategy is typically used in bio-industrial processes to reach a high cell density in the bioreactor. The controlled addition of the nutrient directly affects the growth rate of the culture and helps to avoid overflow metabolism (formation of side metabolites, such as acetate for Escherichia coli, lactic acid in cell cultures, ethanol in Saccharomyces cerevisiae), oxygen limitation (anaerobiosis).

Continuous Culture A chemostat involves adding fresh medium to a culture, mixing, and then allowing an equal volume of culture to drain from the vessel; this is typically done continuously (i.e., a steady stream of fresh medium is added)

Culture Steps Pre-culture: L-tube or Agar Plate Culture 8~12h Flask Culture: 10~12 h Bioreactor Culture: Depends on our requirement

Pre-culture (Flask Culture)

Reactor Culture

Automatic Gas Analyzer

Samples (Tubes)

Growth Compares in E.coli

Basic Bioinformatics and Gene Effects on Growth Rate in Bacteria