Biosimplicity engineering simple life
Download
1 / 42

Biosimplicity: Engineering Simple Life - PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on

Biosimplicity: Engineering Simple Life. Tom Knight MIT Computer Science and Artificial Intelligence Laboratory. Simplicity is the ultimate sophistication -- Leonardo da Vinci. Measuring Complexity. Number of components Size of description Size of functional model.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Biosimplicity: Engineering Simple Life' - fola


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Biosimplicity engineering simple life

Biosimplicity: Engineering Simple Life

Tom Knight

MIT Computer Science and

Artificial Intelligence Laboratory

Simplicity is the ultimate sophistication

-- Leonardo da Vinci


Measuring complexity
Measuring Complexity

  • Number of components

  • Size of description

  • Size of functional model

Have you ever thought... about whatever man builds, that all of man's

industrial efforts, all his calculations and computations, all the

nights spent over working draughts and blueprints, invariably

culminate in the production of a thing whose sole and guiding

principle is the ultimate principle of simplicity?

In any thing at all, perfection is finally attained, not when there is

no longer anything to add, but when there is no longer anything to

take away.

-- Antoine de Sainte Exupery, in "Wind, Sand, and Stars"


Engineered simple organisms
Engineered Simple Organisms

  • modular

  • understood

  • malleable

  • low complexity

  • Start with a simple existing organism

  • Remove structure until failure

  • Rationalize the infrastructure

  • Learn new biology along the way

The chassis and power supply for our computing


Some history
Some history…

  • Confusion over what PPLO/Mycoplasma were

    • “The Microbe of pleuorpneumonia” Nocard 1896

  • 1932 isolation of “PPLO.” Koch postulates.

  • 1958 Klieneberger-Nobel identifies them as free living bacterial species

  • Morowitz 1962 SciAm: “the smallest living cell”

  • 1980 Gilbert effort to sequence M. capricolum

  • 1982 Morowitz “complete understanding of life”

  • 1996 Fraser et al. M. genitalium sequence

  • 1999 Hutchison et al. Minimal genome set for M. genitalium


Relative complexity
Relative Complexity

Mycoplasma genitalium (580 kB)

Mesoplasma florum (793 kB)

S. Cerevisiae (12 MB)

T7 Phage (36 kB)

Human (3.3 GB)

E. Coli (4.6 MB)

Lilly (12 GB)

Plasmid

Gene

100

1K

10K

100K

10M

100M

1G

10G

100G

1M

Alive

Autotroph

Log Genome Size, base pairs



Choosing an organism
Choosing an organism

  • Safe

    • BSL-1 organism -- insect commensal

  • Un-regulated

    • Not a crop plant or domesticated animal pathogen

  • Fast growing

    • 40 minute doubling time

    • vs. six hours for M. genitalium

  • Convenient to work with

    • Facultative anaerobe

  • Small genome

  • Known sequence

  • Complete annotation


The mollicute bibliome

The Mollicute Bibliome

  • Complete collection of mycoplasma related papers:

  • 6,418 and counting

  • All books and book chapters also

  • Endnote & Refworks

  • Downloaded .pdfs for articles > 1995

  • Scanned articles and books, OCR with Abbyy Finereader

  • Plans for a Google appliance search engine

  • Collaboration for “shallow semantic” understanding

  • people.csail.mit.edu/tk/mfpapers/ user=meso, pass=meso



Tomographic tem
Tomographic TEM

Courtesy

Jensen Lab,

Caltech


Culture medium 1161
Culture Medium 1161

  • Beef Heart infusion

  • 4% Sucrose

  • Fresh yeast culture broth

  • 20% horse serum

  • Penicillin

  • Phenol red


Defined medium
Defined medium

Amino acids (minus asp, glu)

Guanine

Uracil

Thymine

Adenine

Glucose

BSA

Palmitic acid

Oleic acid

Sodium phosphate

KCl

Magnesium sulfate

Glycerol

Spermine

Nicotinic acid

Thiamine

Riboflavin

Pyridoxamine

Thioctic acid

Coenzyme A


Synthesis vs import
Synthesis vs. Import

  • Mycoplasma import virtually all small biochemical molecules

  • Each import is done with a specific membrane protein – some are capable of importing a class

  • Complexity is reduced if the import is simpler than the synthesis

  • Example of the opposite:

    • Glutamine  Glutamic acid

    • Asparagine  Aspartic acid


Sequencing the genome
Sequencing the genome

  • First sequenced small portions of the genome to test that we had the correct species

    • Compared the results to Genbank entries

    • Sequenced PTS system gene, identical to reported sequence

    • Sequenced 16S rRNA (unreported)

    • Discovered identical to Mesoplasma entomophilum 16S rRNA sequence – probably the same species

      • Genbank entry

  • Measured genome size with pulse field gels

  • Sequenced 12% of genome to see what we were up against


Pfge of mesoplasma genomic dna
PFGE of Mesoplasmagenomic DNA

y

yeast marker

l

lambda marker

me

Mesoplasma entomophilum

mf

Mesoplasma florum

ml

Mesoplasma lactucae

1.2% agarose 9C 6V/cm Ramped 90 - 120 sec 48 hours


I ceui digestion
I-CeuI digestion

  • Special restriction enzymes cut only at 23S rRNA sites

    5´ ..TAACTATAACGGTC CTAA^GGTAGCGA..3´

    3´ ..ATTGATATTGCCAG^GATT CCATCGCT..5´

    Calculate the number of rRNA sites in the genome from the number of cut fragments



Rrna sequences
rRNA sequences

  • Degenerate primers

  • The two rRNA sequences are different:


Library creation
Library creation

  • Randomly cut genomic DNA with EcoRI

  • Shotgun cloned into pUC18 vector

  • Sequenced the inserts (0.1 – 8 Kb)

  • Sheared the genomic DNA with a needle

  • End repaired

  • Cloned into defective lambda phage vector

  • Packaged vector into phage heads

  • Infected E. coli cells with phage

    • ~ 40 Kb inserts


What we learned from partial sequence
What we learned from partial sequence

  • Almost all “old friends”

  • Little or no extra junk

  • Inter-gene sequences small (-4 to 30 bp)

  • Little transcriptional control

  • High AT vs. GC content – 27% average GC

    • 20% in typical genes, even less in control regions

    • 40% in rRNA regions



Whitehead agreement
Whitehead Agreement

  • Whitehead agreed in January 2002 to sequence the organism

  • Estimated to take about two hours of time on their sequencers

    • “Sure, we can do it Tom, but what do we do with the rest of the day after the coffee break?”

  • “How many other organisms like this are there?”

    • 300

  • “Why don’t we sequence them all?”

  • Good draft available November 2002

  • Gaps closed July 2003 -- final November 2003


Gap closure
Gap closure

  • 9 gaps remained

  • Long range PCR

  • Primer walking

  • One difficult sequence

    • Poly A region 16-17 bp long

    • Sequencing stuttered

    • Reprime with aaaaaaaaaaaaaaaag

  • Repeat region 186 bp in surface lipoprotein

    • Give up on accurate sequence, PCR for length

  • Final assembly verification


Genome characteristics
Genome characteristics

  • 793281 base pairs

  • 26.52% G + C

  • 682 protein coding regions

    • UGA for tryptophan

    • No CGG codon or corresponding tRNA

    • Classic circular genome

    • oriC, terminator region, gene orientation

  • 39 stable RNAs

    • 29 tRNAs

    • 2x 16S, 23S, 5S

    • RNAse-P, tmRNA, SRP


Standard motifs
Standard Motifs

  • -10 present usually very highly conserved

    • Often preceded by a “TG” 1-2 bp upstream

  • Seldom a conserved -35 region

  • RBS is standard Shine-Dalgarno

  • Alternate RBS matches complementary region of 16S rRNA UAACAACAU (Loechel 91)

  • Standard stem-loop terminators with loop TTAA

    • 6-8 poly T tail in forward direction

  • dnaA box TTATCCACA

  • Four ribo-box sequences (Thi, Ile, Val, Guanine)


Understand the metabolism
Understand the metabolism

  • Identify major metabolic pathways by finding critical genes coding for known enzymes

  • Predict necessary enzymes which may not have been found

  • Evaluate the list of unknown function genes for candidates

  • Build the major metabolic pathway map of the organism

  • Consider elimination of entire pathways


Biosimplicity engineering simple life

G. Fournier

02/23/04

PTS II System

Mfl519, Mfl565

sucrose

trehalose

xylose

beta-glucoside

glucose

unknown

ribose

ABC transporter

Mfl516, Mfl527,

Mfl187

Mfl500

Mfl669

Mfl009, Mfl033,

Mfl318, Mfl312

fructose

Mfl214, Mfl187

Mfl619, Mfl431,

Mfl426

ATP Synthase Complex

Mfl181

Mfl497

Mfl515, Mfl526

Mfl499

Mfl317?, Mfl313?

Mfl009, Mfl011, Mfl012,

Mfl425, Mfl615, Mfl034,

Mfl617, Mfl430, Mfl313?

?

Mfl109, Mfl110, Mfl111,

Mfl112, Mfl113, Mfl114,

Mfl115, Mfl116

Mfl666, Mfl667,

Mfl668

glucose-6-phosphate

chitin degradation

Mfl347,

Mfl558

ATP

ADP

sn-glycerol-3-phosphate

ABC transporter

Pentose-Phosphate Pathway

Glycolysis

Mfl023, Mfl024,

Mfl025, Mfl026

L-lactate,

acetate

Mfl223, Mfl640, Mfl642, Mfl105, Mfl349

glyceraldehyde-3-phosphate

Mfl254, Mfl180, Mfl514, Mfl174, Mfl644, Mfl200, Mfl504, Mfl578, Mfl577, Mfl502, Mfl120, Mfl468, Mfl175, Mfl259

Mfl039, Mfl040, Mfl041, Mfl042, Mfl043, Mfl044, Mfl596, Mfl281

Lipid Synthesis

unknown substrate

transporters

Mfl384, Mfl593,

Mfl046, Mfl052

fatty acid/lipid

transporter

ribose-5-phosphate

acetyl-CoA

Mfl230, Mfl382, Mfl286, Mfl663, Mfl465, Mfl626

Mfl590, Mfl591

Mfl099, Mfl474,Mfl315,

Mfl325,Mfl482

x13+

PRPP

cardiolipin/

phospholipids

membrane synthesis

Purine/Pyrimidine Salvage

phospholipid

membrane

Identified Metabolic Pathways in Mesoplasma florum

Mfl074, Mfl075, Mfl276, Mfl665, Mfl463, Mfl144, Mfl342, Mfl343,

Mfl170, Mfl195, Mfl372

Mfl419, Mfl676, Mfl635, Mfl119, Mfl107, Mfl679, Mfl306, Mfl648,

Mfl143, Mfl466, Mfl198, Mfl556, Mfl385

Mfl076, Mfl121, Mfl639, Mfl528, Mfl530, Mfl529, Mfl547, Mfl375

niacin?

Mfl063,

Mfl065,

Mfl038,

Mfl388

xanthine/uracil

permease

Pyridine Nucleotide Cycling

variable surface

lipoproteins

Mfl413, Mfl658

Mfl444, Mfl446,

Mfl451

Mfl340, Mfl373, Mfl521, Mfl588

Mfl583, Mfl288, Mfl002,

Mfl678, Mfl675, Mfl582,

Mfl055, Mfl328

Mfl150, Mfl598, Mfl597,

Mfl270, Mfl649

hypothetical

lipoproteins

DNA

Polymerase

RNA

Polymerase

x22

competence/

DNA transport

Mfl047, Mfl048, Mfl475

Electron Carrier Pathways

DNA

RNA

K+, Na+

transporter

Mfl027, Mfl369

Flavin Synthesis

Mfl064, Mfl178

Nfl289, Mfl037, Mfl653, Mfl193

NAD+

Mfl165, Mfl166

ribosomal RNA

transfer RNA

degradation

FMN, FAD

Mfl193

Mfl563, Mfl548, Mfl088,

Mfl258, Mfl329, Mfl374,

Mfl541, Mfl005, Mfl647,

Mfl231, Mfl209

Mfl029, Mfl412, Mfl540,

Mfl014, Mfl196,Mfl156,

Mfl282, Mfl387, Mfl682,

Mfl673, Mfl077, rnpRNA

Mfl283, Mfl334

malate

transporter?

hypothetical

transmembrane

proteins

NADP

Mfl378

x57

Ribosome

metal ion

transporter

Signal Recognition Particle (SRP)

riboflavin?

Mfl356, Mfl496,

Mfl217

messenger RNA

tRNA

aminoacylation

NADPH

NADH

protein secretion

(ftsY)

srpRNA, Mfl479

23sRNA, 16sRNA, 5sRNA,

Mfl122, Mfl149, Mfl624, Mfl148, Mfl136, Mfl284, Mfl542, Mfl132, Mfl082,

Mfl127, Mfl561, Mfl368.1, Mfl362.1, Mfl129, Mfl586, Mfl140, Mfl080,

Mfl623, Mfl137, Mfl492, Mfl406

Mfl608, Mfl602, Mfl609, Mfl493, Mfl133, Mfl141, Mfl130, Mfl151, Mfl139,

Mfl539, Mfl126, Mfl190, Mfl441, Mfl128, Mfl125, Mfl134, Mfl439, Mfl227,

Mfl131, Mfl123, Mfl638, Mfl396, Mfl089, Mfl380, Mfl682.1, Mfl189,

Mfl147, Mfl124, Mfl135, Mfl138, Mfl601, Mfl083, Mfl294, Mfl440?

cobalt

ABC transporter

Mfl237

Mfl152, Mfl153,

Mfl154

proteins

Formyl-THF Synthesis

Export

Mfl613, Mfl554, Mfl480, Mfl087, Mfl651, Mfl268, Mfl366,

Mfl389, Mfl490, Mfl030, Mfl036, Mfl399, Mfl398, Mfl589,

Mfl017, Mfl476, Mfl177, Mfl192, Mfl587, Mfl355

Mfl086, Mfl162, Mfl163, Mfl161

phosphonate

ABC transporter

met-tRNA formylation

Mfl571, Mfl572

Mfl060, Mfl167, Mfl383, Mfl250

protein translocation

complex (Sec)

Mfl057, Mfl068,

Mfl142,Mfl090,

Mfl275

Mfl409, Mfl569

phosphate

ABC transporter

Mfl233, Mfl234,

Mfl235

degradation

THF?

Mfl186

formate/nitrate

transporter

amino acids

intraconversion?

Mfl094, Mfl095,

Mfl096, Mfl097,

Mfl098

Mfl418, Mfl404, Mfl241, Mfl287, Mfl659, Mfl263, Mfl402,

Mfl484, Mfl494, Mfl210, tmRNA

Mfl509, Mfl510,

Mfl511

spermidine/putrescine

ABC transporter

oligopeptide

ABC transporter

Mfl016, Mfl664

putrescine/ornithine

APC transporter

Mfl015

Mfl182, Mfl183,

Mfl184

Mfl019

Mfl605

arginine/ornithine

antiporter

Mfl557

Mfl652

unknown amino acid

ABC transporter

glutamine

ABC transporter

lysine

APC transporter

alanine/Na+

symporter

glutamate/Na+

symporter

Amino Acid Transport


How simple is this
How Simple is this?

  • Missing cell wall, outer membrane

  • Missing TCA cycle

  • Missing amino acid synthesis

  • Missing fatty acid synthesis

  • One sigma factor

  • Small number of dna binding proteins

  • One insertion sequence, probably not active

  • One restriction system (Sau3AI-like)

  • CTG/CAG methylation (function?)

  • Evidence for shared protein function

    • MDH/LDH (Pollack 97 Crit rev microbiol 23:269)


Minimal is not always simple
Minimal is not always simple

  • Shared function of parts

  • Overlapped genes

  • Tradeoff of import vs. synthesis

  • Example:

    • Television set design

    • Shared deflection coil, high voltage power supply, isolated filament supply

    • Three functions, a single circuit, a difficult engineering, modeling, debugging, and repair task

  • how many genes have multiple functions


Dna methylation
DNA Methylation

  • Bisulfite conversion of genomic DNA

  • Sequencing of converted DNA to identify methylated C positions

  • Results: GATC sequences, as expected

  • Unexpected: CAG and CTG sequences


Current work
Current work

  • Array experiments

    • Close species

    • Transcriptional units, pseudo-genes

    • TRASH

  • Protein species by LC/LC/MS/MS

  • Elimination of the restriction system

  • Plasmid system

    • pBG7AU based

  • Recombination system

    • Positive/negative selection

  • Yeast chromosome transfer

  • Genome edits to reduce size

  • Genome edits to modularize

  • Genome edits to eliminate complexity

  • Use as a construction chassis


Reconstruct the genome
Reconstruct the Genome

  • Use recombination techniques to edit the genome

  • Eliminate unnecessary genes

  • Remove overlaps

  • Standardize promoters, ribosomal binding sites

  • Identify transcriptional and translational regulators

  • Recode proteins to use a reduced portion of the coding space

The code is 4 billion years old,

it’s time for a rewrite


Yac mutagenesis

Bring up YAC technology

Spheroplasts, old YAC plasmid sequencing

Triple transform with MF chromosome and

pRML1 (Spencer 92)

pRML2

Genome inactive except for ARS, telomeres, selection markers

Use yeast recombination systems for genome editing

Isolate and recircularize YAC to form a new genome

Lipid encapsulate genome into vesicles

Fuse vesicles with genome-killed wild type cells

YAC mutagenesis


Proteome

Collaboration with Steve Tannenbaum / Yingwu Wang

2-D gels + MS spot ID

LC/LC/MS/MS ID of trypsin digests

Proteome


Riboswitch analysis
Riboswitch Analysis

  • Collaboration with Ron Breaker / Adam Roth

  • Discovery of unique riboswitches specific for GTP rather than dGTP

  • Found in no other sequenced genomes

  • Analysis of close relatives under way


Engineer plasmids
Engineer plasmids

  • No known plasmids for this class of organism

  • Renaudin has made plasmids using the chromosomal OriC as the replicative element

    • Lartigue 03

  • M. mycoides has pADB201 and pBG7AU rolling circle plasmids similar to pE194 (1082 bp)

  • We know the antibiotic sensitivities and have working resistance genes


Kit part the genome
Kit Part the genome

  • Make Biobrick parts from each gene, tRNA, promoter, other part-like genome element

  • Attempt to develop techniques for recombining parts into coherent modules

  • Develop techniques for assembling and modeling the resulting structures


Thanks to

Harold Morowitz

Greg Fournier

Gail Gasparich

Bob Whitcomb

Eric Lander

Bruce Birren

Nicole Stange-Thomann

George Church

Roger Brent

Grant Jensen

Yingwu Wang

Nick Papadakis

Ron Weiss

Drew Endy

Randy Rettberg

Austin Che

Reshma Shetty

MIT Synthetic biology working group

DARPA, NTT, NSF, Microsoft

Thanks to…


Synthetic biology
Synthetic Biology

  • An alternative to understanding complexity is to remove it

  • This complements rather than replaces standard approaches

  • Engineering synthetic constructs will be easier

    • Enabling quicker more facile experiments

    • Enabling deeper understanding of the basic mechanisms

    • Enabling applications in nanotechnology, medicine and agriculture

Simplicity is the ultimate sophistication

-- Leonardo da Vinci