Genome Function Project
Download
1 / 40

Genome Function Project - PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on

Genome Function Project. UCSC George Church 24 Aug 2001. We thank for support: Government and private grant agencies: NHLBI, NSF, ONR, DOE, DARPA, HHMI, Lipper, Armenise Corporate collaborators & sponsors: Affymetrix, GTC, Mosaic, Aventis, Dupont. Post-Structural Genomics Data.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Genome Function Project' - hume


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Genome Function Project

UCSC

George Church 24 Aug 2001

We thank for support:

Government and private grant agencies:NHLBI,

NSF, ONR, DOE, DARPA, HHMI, Lipper, Armenise

Corporate collaborators & sponsors:

Affymetrix, GTC, Mosaic, Aventis, Dupont


Post structural genomics data
Post-StructuralGenomics Data

gcggatttagctcagttgggag agcgc

cagact gaaga

tttgga ggtcctgtgtt cgatc

cacagaattcgcacca


Post 300 genome sequences
Post-300 Genome Sequences

0.5 to 7 Mbp

10 Mbp to 1000 Gbp

figure


Function genomics measures models
Function Genomics Measures & Models

Environment

Metabolites

Interactions

RNA

DNA

Protein

Growth rate

Expression


Exponential technologies
Exponential technologies

1993 first browser

1994 commercial www


Agenda
Agenda

1. mapping human variation (haplotype map)

2. obtaining a complete and validated set of human genes including

- multiple alleles, transcripts, protein or structural RNA products

- regulatory elements

3. understanding the diversity of life through genomic analysis of many

organisms, and understanding how one organism works by comparative

genomics with others

- how genomes evolved

4. creating a new quantitative systems biology, beyond drawing circles

and arrows on paper and labeling them with names nobody can remember

- mapping the key interactions

- mathematical/computational models of pathways and systems

- dealing with multiple levels from atoms to cells


In vitro minigenome
In vitro minigenome

Steve Blackwell, HMS: pure IF, EF

Tony Forster, BWH: tRNAs & modified bases

Manz Ehrenberg, Dieter Soll : tRNA-synthetases

Josh LaBaer, HMS-HIP: Expression constructs

Jingdong Tian, HMS: Protein synthesis

Rob Mitra & Xiaohua Huang, HMS: Polymerases, RCA

Gloria Culver, Iowa State: ribosomal proteins & rRNA

Harry Noller, UCSC: ribosomes


In vitro minigenome1
In vitro minigenome

A) From atoms to evolving minigenomes and cells.

This could improve in vitro transcription/translation/replication systems and conceptually link atomic (mutational) changes via molecular and systems modeling to population evolution. The synthesis of pure systems of proteins with natural or novel modifications would be or great significance. This could give an incredible focus to structural genomics.

B) From cells to tissues.

Modeling the effects of combinations of membrane signals and genome-programming on RNA and protein expression profiles, would allow, among other things, manipulating stem-cell fate and stability. Stability would be key to both cell culture and to long-term avoidance of cancerous stem-cell proliferation. The ability of "programmed" cells to replace or augment small molecule drugs could be rigorously assessed.

C) From tissues to systems

Computational programming of cell and tissue morphology can develop quantitative concepts in complexity, chaos, robustness, evolvability to engineer useful models such as sensor-effector neural feedback systems where macro aspects of the system determine the past (Darwinian) or future (prosthetic) function of the altered genomes.


Grand challenges goals details
Grand Challenges: goals (& details)

  • The Manhattan Project ’43-45: Nuclear chain reaction (without igniting the atmosphere)

  • The Apollo Project ’62-69: Send a person to the moon (& back)

  • The Smallpox Eradication ’66-77: from the whole globe (including freezers)

  • The Human Genome Project ’90-05: 3 billion bases (at 99.99% accuracy & searchable)


Grand challenges goals details1
Grand Challenges: goals (& details)

  • The Manhattan Project ’43-45: Nuclear chain reaction (without igniting the atmosphere)

  • The Apollo Project ’62-69: Send a person to the moon (& back)

  • The Smallpox Eradication ’66-77: from the whole globe (including military freezers?)

  • The Human Genome Project ’90-05: 3 billion bases (at 99.99% accuracy with comparisons)

  • The BioSystems Project ’02- ??


Potential biosystems project challenges
Potential BioSystems Project Challenges

Programming smart biomaterials

1. 0.1 nanometer positioning at 1kHz in a

50nm cube (Foresight Feynman Challenge)

2. I/O to sub-nano memory in DNA

Programming cells & populations:

3. 10 sec. mini-cell cycle, 85kbp genome

4. Bioremediation microbial populations

Programming ourselves:

5. Drug structure-activity prioritization

6. Universal, non-aging human stem cells


Potential biosystems project challenges1
Potential BioSystems Project Challenges

Programming smart biomaterials

1. 0.1 nanometer positioning at 1kHz in a

50nm cube (Foresight Feynman Challenge)

2. I/O to sub-nano memory in DNA

Programming cells & populations:

3. 10 sec. mini-cell cycle, 85kbp genome

4. Bioremediation microbial populations

Programming ourselves:

5. Drug structure-activity prioritization

6. Universal, non-aging human stem cells


Why the genome project worked
Why the genome project worked

Ulam’61-74, Staden’79, Lipman’87, Myers’87, Green’93...

Sequence searching

Hood’75-00, Hunkapiller’77-00, Carruthers’79...

Polymer synthesis & sequencing

Tabor’93, Karger’94, Mathies’96, Mullis’84...

Chemistry

Shotgun & mapping

Sanger’77, Brenner’72-02, Sulston’90, Olson’80-00...

Infrastructure

Wada’82, DeLisi’84, Gilbert’87, Watson’88, Venter’91...


Metrics for structural functional data
Metrics for structural & functional data

Automate Data Model Similarity

quality quality search

X-ray 1960 resolution |o-c|/o DALI,etc.

diffraction < 0.2nm R < 0.2

Sequence 1988 discrepancy conserved BLAST

bp <0.01% proteins

Expression 1999 cc, t-test shared motifs, Biclustering

shared function

Interact/growth outliers optimality as above?


Types of systems interaction models
Types of Systems Interaction Models

Quantum Electrodynamics subatomic

Quantum mechanics electron clouds

Molecular mechanics spherical atoms nm-fs

Master equations stochastic single molecules

Fokker-Planck approx. stochastic

Macroscopic rates ODE Concentration & time (C,t)

Flux Balance Optima dCik/dt optimal steady state

Thermodynamic models dCik/dt = 0 k reversible reactions

Steady State SdCik/dt = 0 (sum k reactions)

Metabolic Control Analysis d(dCik/dt)/dCj (i = chem.species) Spatially inhomogenous dCi/dx

Population dynamics as above km-yr

Increasing scope, decreasing resolution


Sources of data for biosystems modeling

Capillary electrophoresis $300,000

(DNA Sequencing) : 0.4Mb/day

Chromatography-Mass Spectrometry

(eg. peptide LC-ESI-MS) :

20Mb/day

Microarray scanners (eg. RNA) :

300 Mb/day mpg

Reagent costs: mpg

Electrophoresis (DNA Sequencing) : 10 ul per 0.5 Kb

Microarray reactions: 10 ul per 1000 Kb

Sources of Data for BioSystems Modeling:

Intel cmos

microscope

$99


Rna quantitation aach rindone church 2000 genome research 10 431 445
RNA quantitation Aach, Rindone, Church, (2000) Genome Research 10: 431-445.

experiment

ORF

  • R/G ratios

  • R, G values

  • quality indicators

control

  • Microarrays1

  • Affymetrix2

  • SAGE3

ORF

  • Averaged PM-MM

  • “presence”

  • feature statistics

  • 25-mers

PM

MM

ORF SAGE Tag

  • Counts of SAGE 14-mers sequence tags for each ORF

concatamers

1 DeRisi, et.al., Science278:680-686 (1997)

2 Lockhart, et.al., Nat Biotech14:1675-1680 (1996)

3 Velculescu, et.al, Serial Analysis of Gene Expression, Science270:484-487 (1995)


Array opportunities
Array opportunities

  • 22 bp ds-RNAi array modulates single cell type

  • Drug array time-release or photo-release

  • Primer pair arrays for haplotyping

  • Gene & genome synthesis (DARPA)


Polypeptide arrays
Polypeptide arrays

Photo-deprotect peptides (Affymax)

Piezo or contact spotting (Harvard-CGR, Stanford)

Phage or ribosome display capture (Bulyk)

In situ ribosomal synthesis (Tian)

Harvard Inst. Proteomics, FLEXGene consortium


B

A’

A’

A’

B

B

B

A’

B

B

B

A’

A’

A’

A’

B

A’

B

B

Primer A has 5’ immobilizing

(Acrydite) modification.

Single Molecule From Library

A’

Primer is Extended

by Polymerase

A

1st Round of PCR


3’

3’

5’

5’

B

B

B’

B’

A

G

T

C

G

T

G

.

.

.

.

Sequence polonies by sequential,

fluorescent single-base extensions

1. Remove 1 strand of DNA.

2. Hybridize Universal Primer.

3. Add Red(Cy3) dTTP.

4. Wash; Scan Red Channel


B

B

B’

B’

Sequence polonies by sequential, fluorescent single-base extensions

5. Add Green(FITC) dCTP

6. Wash; Scan Green Channel

3’

5’

3’

5’

C

G

A

T

C

G

C

G

T

.

.

.


Polony Template

T

A

T

T

G

T

T

A

A

A

G

T

G

T

G

T

C

C

T

T

T

G

T

C

G

A

T

A

C

T

G

G

T

A

…5’

3’

P’

A

T

A

A

C

A

A

T

T

T

C

A

C

A

C

A

G

G

A

A

A

C

A

G

C

T

A

T

G

A

C

C

A

T

5’

P

Primer Extension 26 cycles, 34 Nucleotides

Mean Intensity: 58, 0.5 40, 6.5 0.3, 48 0.4, 43

FITC ( C)

CY3 ( T)



Function genomics measures models1
Function Genomics Measures & Models

Environment

Metabolites

RNAi

Insertions

SNPs

RNA

DNA

Protein

Growth rate

microbes

stem cells

cancer cells

multicellular organisms


Competition among multiple mutations multiple homologous domains

lysC

1

2

10.4

Competition among multiple mutations & multiple homologous domains

thrA

1

2

3

1.1 6.7

metL

1

2

3

1.8 1.8

Selective disadvantage in minimal media

probes


Multiple mutations per gene
Multiple mutations per gene

Correlation between two selection experiments


Comparison of selection data with fbo predictions scale up from79 to 488 genes

predictions

number of genes

negatively selected

not negatively selected

essential

143

80

63

reduced growth rate

46

24

22

non essential

299

119

180

Comparison of selection data with FBO predictions(scale up from79 to 488 genes)

>

Novel

duplicates?

<

Position

effects?

P-value Chi Square = 0.004


Function genomics measures models2
Function Genomics Measures & Models

Environment

Metabolites

RNA

Protein

DNA

Expression


Rna quantitation frequently asked questions
RNA quantitation(Frequently Asked Questions)

Is less than a 2-fold RNA-ratio ever important?

Yes; 1.5-fold in trisomies.

Why oligonucleotides rather than cDNAs?

Alternative RNAs, gene families.

Using a subset of the genome

or ratios to various control RNAs?

Trouble for later (meta) analyses.


Lpp mRNA start & structure

See: Selinger et al

Nat Biotech


Oligo selection

gene sequences

generate candidate oligos

predict cross-hybridization

filter & select oligos

experimental results

parameters

(Tm, length, ...)

gene-specific

oligos

background sequences

generate chip layout

generate control, border oligos

controls, text,

border oligos

chip layout

Oligo selection

  • PGA/Smith group already designing software for oligo selection

  • Church Lab / Lipper Center has additional tools

    • Unique oligos (cu-15s)

    • RNA string matching program

Figure courtesy of Adnan Derti


Combinatorial arrays for binding constants

(EGR1)

HMS: Martha Bulyk, Xiaohua Wang, Martin Steffen

MRC: Yen Choo

ds-DNA

array


pVIII

pIII

Antibodies

Phage

Combinatorial arrays for binding constants

Combinatorial DNA-binding protein domains

ds-DNA

array


Combinatorial arrays for binding constants

Phycoerythrin

- 2º IgG

Phage

Combinatorial DNA-binding protein domains

ds-DNA

array

Martha Bulyk et al


Interactions of Adjacent Basepairs in EGR1

Zinc Finger DNA Recognition

Isalan et al., Biochemistry (‘98) 37:12026-12033


Wildtype EGR1 Microarray

high [DNA]

(+) ctrl sequence

for wt binding

etc.

alignment oligos


Motifs weight all 64 Kaapp

Wildtype

RSDHLTT

TGG 2.8 nM

GCG 16 nM

2.5 nM

TAT 5.7 nM

AAA,AAT,ACT,AGA,

AGC,AGT,CAT,CCT,

CGA,CTT,TTC,TTT

AAT 240 nM

RGPDLAR

REDVLIR

LRHNLET

KASNLVS


For more information arep med harvard edu
For more information:arep.med.harvard.edu


ad