genomics bioinformatics systems biology l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Genomics, Bioinformatics & Systems Biology PowerPoint Presentation
Download Presentation
Genomics, Bioinformatics & Systems Biology

Loading in 2 Seconds...

play fullscreen
1 / 67

Genomics, Bioinformatics & Systems Biology - PowerPoint PPT Presentation


  • 175 Views
  • Uploaded on

Genomics, Bioinformatics & Systems Biology. Steve Sontum & Jeremy Ward Middlebury College. What We Hope to Learn. Course structure, projects and policies Overview of Bioinformatics Computational Biology, definition, subdisciplines Genomic and Proteomic data bases

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Genomics, Bioinformatics & Systems Biology' - pooky


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
genomics bioinformatics systems biology

Genomics, Bioinformatics & Systems Biology

Steve Sontum & Jeremy Ward

Middlebury College

what we hope to learn
What We Hope to Learn
  • Course structure, projects and policies
  • Overview of Bioinformatics
    • Computational Biology, definition, subdisciplines
  • Genomic and Proteomic data bases
    • OMICS and Molecular Biology OMES
    • Bike maintenance metaphor
    • OMIM Data base – Focus of Homework
  • Topics in Bioinformatics (Applications)
  • Read Mark Gerstein's paper

Core

mbbc324 syllabus genomics bioinformatics systems biology
MBBC324 SyllabusGenomics,Bioinformatics, & Systems Biology
  • Instructors
  • Class Time
  • Texts
  • Web Sites
  • Objectives
  • Grading
    • Homework & Notebook
    • Discussion & Preparation
    • Midterm Exam
    • Term Project
  • Etiquette
  • Project Topics
http s11 middlebury edu mbbc0324a
http://s11.middlebury.edu/MBBC0324A/

http://s11.middlebury.edu/MBBC0324A/

what is life
What is Life?

Reproduction

Metabolism

Evolution

?

Life Performs Computation

genome as program
Genome as Program

actcttctggtccccacagactcagagagaacccaccatggtgctgtctcctgccgacaagaccaacgtcaaggccgcctggggtaaggtcggcgcgcacgctggcgagtatggtgcggaggccctggagaggatgttcctgtccttccccaccaccaagacctacttcccgcacttcgacctgagccacggctctgcccaggttaagggccacggcaagaaggtggccgacgcgctgaccaacgccgtggcgcacgtgg...

A computer program for …

genome as program7

1 0 0 0 0 0 0 0 0 0 1 = $ 1025

Genome as Program

A sensitivity to small changes, a single mutation, can result in amplified large changes

Changing a single bit in your bank balance

0 0 0 0 0 0 0 0 0 0 1 = $ 1

are 30 000 human genes too few
Are 30,000 Human Genes Too Few?
  • Revealed: The secret of human behavior:Environment, not genes, key to our acts. London Observer February 11, 2001
  • Craig Venter on breaking the news that the human genome had only 30,000 genes said:“We simply do not have enough genes for the idea of biological determinism to be right. The wonderful diversity of the human species is not hard-wired in our genetic code. Our environments are critical.”
are 30 000 human genes too few9
Are 30,000 Human Genes Too Few
  • Venter’s statement was the making of a new myth, that 30,000 genes were “too few” to explain human nature.
  • How many genes (each coming in only two varieties (on or off) would we need to make each of the 7 billion people in the world unique?
are 30 000 human genes too few10
Are 30,000 Human Genes Too Few
  • Venter’s statement was the making of a new myth, that 30,000 genes were “too few” to explain human nature.
  • How many genes (each coming in only two varieties (on or off) would we need to make each of the 7 billion people in the world unique?
  • 233 = 8.6 billion !
  • Only 33 are needed !
21 000 human genes is not too few
21,000 Human Genes is Not Too Few

Proteins the products of genes are like Legos, they interact and connect in various ways. Bioinformatics deals with the information that organizes these pieces into a Human. Systems Biology deals with the interactions between these pieces. And finally, Genomics is the study of genes.

what does the dark matter do
What Does the Dark Matter Do?

Shining a light on the Genome’s Dark Matter E. Pennisi Science 330, 1614(2010)

Is it “junk” DNA

2% Translated into proteins

5% Evolutionarily conserved between Mouse and Human

80% Transcribed into non-coding RNA

what is bioinformatics

Biological

Data

Computer

Analysis

+

What is Bioinformatics

National

Center for

Biological

Information

recieves

5 terabytes/day

!!!!!!!!!!!!!!!!!!!!!!!!

Mouse Genome: 2.5 billion base pairsHuman Genome: 3 billion base pairs

what is informatics
What is “informatics”
  • Derived from the French word informatique
    • Referring to automated information processing and storage
    • The name is a contraction of “information” and “automatic”
  • Definition:Informatics is the science that deals with information, its structure, its storage, and its use
  • Tends to be associated with specific application areas
    • Medical informatics (applied to clinical medicine)
    • Bioinformatics (applied to biological research)
    • Business informatics (applied to management and information systems)

Musen

where does bioinformatics come from
Where does Bioinformatics come from?

Information Theory

Artificial Intelligence

Systems Analysis

Graph Theory

Robotics

Algorithms

Statistics

Methods

Bioinformatics

Genomics

Systems Biology

Data from the Human Genome Project has fueled the development of new bioinformatics methods

Computational Biology

definition for bioinformatics
Definition for Bioinformatics?

Core

  • (Molecular)Bio - informatics
  • One idea for a definition?Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) and then applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale.
  • Bioinformatics is a practical discipline with many applications.

Read Mark Gerstein's paper it will help you formulate a project proposal.

what is bioinformatics17
What is Bioinformatics?

(Molecular)Bio - informatics

One idea for a definition?Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) and then applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale.

Bioinformatics is a practical discipline with many applications.

Gerstein

what is the information molecular biology as an information science
Central Dogmaof Molecular BiologyDNA -> RNA -> Protein -> Phenotype -> DNA

Molecules

Sequence, Structure, Function, Interaction

Processes

Mechanism, Specificity, Regulation

Central Paradigmfor BioinformaticsGenomic Sequence Information -> mRNA (level) -> Protein Sequence -> Protein Structure -> Protein Function -> Protein Interaction -> Phenotype

Large Amounts of Information

Standardized and Statistical

Computer Processing

What is the Information?Molecular Biology as an Information Science

Core

Information

Flow

Data Structures Mimic Biology

(idea from D Brutlag)

molecular biology information dna
Molecular Biology Information - DNA

FASTA format

>16 dna:chromosome:NM_000558:HBA1

actcttctgg tccccacaga ctcagagaga acccaccatg gtgctgtctc ctgccgacaa

gaccaacgtc aaggccgcct ggggtaaggt cggcgcgcac gctggcgagt atggtgcgga

ggccctggag aggatgttcc tgtccttccc caccaccaag acctacttcc cgcacttcga

cctgagccac ggctctgccc aggttaaggg ccacggcaag aaggtggccg acgcgctgac

caacgccgtg gcgcacgtgg acgacatgcc caacgcgctg tccgccctga gcgacctgca

cgcgcacaag cttcgggtgg acccggtcaa cttcaagctc ctaagccact gcctgctggt

gaccctggcc gcccacctcc ccgccgagtt cacccctgcg gtgcacgcct ccctggacaa

gttcctggct tctgtgagca ccgtgctgac ctccaaatac cgttaagctg gagcctcggt

ggccatgctt cttgcccctt gggcctcccc ccagcccctc ctccccttcc tgcacccgta

cccccgtggt ctttgaataa agtctgagtg ggcggc

  • Raw DNA Sequence
    • Coding or Not?
      • Open Reading Frame
      • Promotor sites?
      • Exons
      • Introns
molecular biology information dna21
Molecular Biology Information - DNA

Raw DNA Sequence

Coding or Not? HBA1 gene

Open Reading Frame atg - taa

Promotor sites?

Exons Three

Introns none (mRNA)

>16 dna:chromosome:NM_000558:HBA1

actcttctgg tccccacaga ctcagagaga acccaccatggtgctgtctc ctgccgacaa

gaccaacgtc aaggccgcct ggggtaaggt cggcgcgcac gctggcgagt atggtgcgga

ggccctggagaggatgttcctgtccttccc caccaccaag acctacttcc cgcacttcga

cctgagccac ggctctgccc aggttaaggg ccacggcaag aaggtggccg acgcgctgac

caacgccgtg gcgcacgtgg acgacatgcc caacgcgctg tccgccctga gcgacctgca

cgcgcacaag cttcgggtgg acccggtcaacttcaagctcctaagccact gcctgctggt

gaccctggcc gcccacctcc ccgccgagtt cacccctgcg gtgcacgcct ccctggacaa

gttcctggct tctgtgagca ccgtgctgac ctccaaataccgttaagctggagcctcggt

ggccatgctt cttgcccctt gggcctcccc ccagcccctc ctccccttcc tgcacccgta

cccccgtggt ctttgaataa agtctgagtg ggcggc

  • Raw DNA Sequence
    • Parse into genes?
      • 4 bases: agct
      • ~1 Kb in a gene
      • ~2 Mb in genome
      • ~3 Gb Human
molecular biology information mrna
Molecular Biology Information - mRNA
  • Expressed Sequence Tags
    • Cloned mRNA
      • cDNA library
      • Mapped to genome
      • Expressed genes
      • Tissue specific
    • One-shot sequencing
      • 4 bases: AGCU
      • 300-800 bases
      • ~52 M ESTs

Poly-T primer + reverse transcriptase

3' EST

cDNA

the genetic code
The Genetic Code

ATG start condon ;TAG, TAA, TGA stop codons

molecular biology information protein sequence
Molecular Biology Information: Protein Sequence
  • 20 letter alphabet
    • ACDEFGHIKLMNPQRSTUVWY but not BJOXZ
  • Strings of ~300 aa in an average protein (in bacteria), ~200 aa in a domain
  • ~200 K known protein sequences

Fasta format

Insulin

> Insulin:Homo Sapiens: AAA59172

1 malwmrllpl lallalwgpd paaafvnqhl cgshlvealy lvcgergffy tpktrreaed

61 lqvgqvelgg gpgagslqpl alegslqkrg iveqcctsic slyqlenycn

Dihydrofolate Reductase

d1dhfa_ LNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTTTSSVEGKQ-NLVIMGKKTWFSI

d8dfr__ LNSIVAVCQNMGIGKDGNLPWPPLRNEYKYFQRMTSTSHVEGKQ-NAVIMGKKTWFSI

d4dfra_ ISLIAALAVDRVIGMENAMPWN-LPADLAWFKRNTL--------NKPVIMGRHTWESI

d3dfr__ TAFLWAQDRDGLIGKDGHLPWH-LPDDLHYFRAQTV--------GKIMVVGRRTYESF

d1dhfa_ VPEKNRPLKGRINLVLSRELKEPPQGAHFLSRSLDDALKLTEQPELANKVDMVWIVGGSSVYKEAMNHP

d8dfr__ VPEKNRPLKDRINIVLSRELKEAPKGAHYLSKSLDDALALLDSPELKSKVDMVWIVGGTAVYKAAMEKP

d4dfra_ ---G-RPLPGRKNIILS-SQPGTDDRV-TWVKSVDEAIAACGDVP------EIMVIGGGRVYEQFLPKA

d3dfr__ ---PKRPLPERTNVVLTHQEDYQAQGA-VVVHDVAAVFAYAKQHLDQ----ELVIAGGAQIFTAFKDDV

molecular biology information macromolecular structure
Molecular Biology Information:Macromolecular Structure
  • DNA/RNA/Protein
    • Almost all protein

(RNA Adapted From D Soll Web Page, Right Hand Top Protein from M Levitt web page)

molecular biology information protein structure details

Atom # Residue Residue #

Molecular Biology Information: Protein Structure Details
  • Statistics on Number of XYZ triplets
    • 200 residues/domain -> 200 CA atoms, separated by 3.8 A
    • Avg. Residue size is Leu: 4 backbone atoms + 4 sidechain atoms, 150 cubic A or a bead of diameter 6.6 A
      • => ~1500 xyz triplets (=8x200) per protein domain

X Y Z

PDB format

ATOM 1 C ACE 0 9.401 30.166 60.595 1.00 49.88 1GKY 67

ATOM 2 O ACE 0 10.432 30.832 60.722 1.00 50.35 1GKY 68

ATOM 3 CH3 ACE 0 8.876 29.767 59.226 1.00 50.04 1GKY 69

ATOM 4 N SER 1 8.753 29.755 61.685 1.00 49.13 1GKY 70

ATOM 5 CA SER 1 9.242 30.200 62.974 1.00 46.62 1GKY 71

ATOM 6 C SER 1 10.453 29.500 63.579 1.00 41.99 1GKY 72

ATOM 7 O SER 1 10.593 29.607 64.814 1.00 43.24 1GKY 73

ATOM 8 CB SER 1 8.052 30.189 63.974 1.00 53.00 1GKY 74

ATOM 9 OG SER 1 7.294 31.409 63.930 1.00 57.79 1GKY 75

ATOM 10 N ARG 2 11.360 28.819 62.827 1.00 36.48 1GKY 76

ATOM 11 CA ARG 2 12.548 28.316 63.532 1.00 30.20 1GKY 77

ATOM 12 C ARG 2 13.502 29.501 63.500 1.00 25.54 1GKY 78

...

ATOM 1444 CB LYS 186 13.836 22.263 57.567 1.00 55.06 1GKY1510

ATOM 1445 CG LYS 186 12.422 22.452 58.180 1.00 53.45 1GKY1511

ATOM 1446 CD LYS 186 11.531 21.198 58.185 1.00 49.88 1GKY1512

ATOM 1447 CE LYS 186 11.452 20.402 56.860 1.00 48.15 1GKY1513

ATOM 1448 NZ LYS 186 10.735 21.104 55.811 1.00 48.41 1GKY1514

ATOM 1449 OXT LYS 186 16.887 23.841 56.647 1.00 62.94 1GKY1515

TER 1450 LYS 186 1GKY1516

molecular biology information whole genomes
Molecular Biology Information:Whole Genomes
  • The Revolution Driving Everything

Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E. F., Kerlavage, A. R., Bult, C. J., Tomb, J. F., Dougherty, B. A., Merrick, J. M., McKenney, K., Sutton, G., Fitzhugh, W., Fields, C., Gocayne, J. D., Scott, J., Shirley, R., Liu, L. I., Glodek, A., Kelley, J. M., Weidman, J. F., Phillips, C. A., Spriggs, T., Hedblom, E., Cotton, M. D., Utterback, T. R., Hanna, M. C., Nguyen, D. T., Saudek, D. M., Brandon, R. C., Fine, L. D., Fritchman, J. L., Fuhrmann, J. L., Geoghagen, N. S. M., Gnehm, C. L., McDonald, L. A., Small, K. V., Fraser, C. M., Smith, H. O. & Venter, J. C. (1995). "Whole-genome random sequencing and assembly of Haemophilus influenzae rd." Science 269: 496-512.

Genome sequence now accumulate so quickly that, in less than a week, a single laboratory can produce more bits of data than Shakespeare managed in a lifetime, although the latter make better reading.

-- G A Pekso, Nature401: 115-116 (1999)

  • Integrative Data

1995, HI (bacteria): 1.6 Mb & 1600 genes done

1997, yeast: 13 Mb & ~6000 genes for yeast

1998, worm: ~100Mb with 19 K genes

1999: >30 completed genomes!

2003, human: 3 Gb & $3 billion 14 years

2007, Watson: 3 Gb & $1 million 2 months

molecular biology information microarray
Molecular Biology Information - Microarray

Scan

Hybridize

cDNAs

Biotin Label

Affymetrix GeneChip®

55,000 transcripts

Gives levels of gene expression but the signal is noisy

molecular biology information epigenetics
Molecular Biology Information - Epigenetics
  • Raw DNA Sequence
    • Coding or Not?
      • Promotor sites?
      • Exons
      • Introns
    • Parse into genes?
      • 4 bases: AGCT
      • ~1 K in a gene
      • ~2 M in genome
      • ~3 Gb Human
molecular biology information other integrative data
Molecular Biology Information:Other Integrative Data
  • Information to understand genomes
    • Metabolic Pathways (glycolysis), traditional biochemistry
    • Regulatory Networks
    • Whole Organisms Phylogeny, traditional zoology
    • Environments, Habitats, ecology
    • The Literature (MEDLINE)
  • The Future....

(Pathway drawing from P Karp’s EcoCyc, Phylogeny from S J Gould, Dinosaur in a Haystack)

what is bioinformatics31
What is Bioinformatics?

(Molecular)Bio - informatics

One idea for a definition?Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) and then applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale.

Bioinformatics is a practical discipline with many applications.

Gerstein

large scale information genbank growth33
Large-scale Information:GenBank Growth

Core

Bioinformatics

Is Essential Now

large scale information explonential growth of data matched by development of computer technology
Large-scale Information:Explonential Growth of Data Matched by Development of Computer Technology
  • CPU vs Disk & Net
    • As important as the increase in computer speed has been, the ability to store large amounts of information on computers is even more crucial
  • Driving Force in Bioinformatics

(Internet picture adaptedfrom D Brutlag, Stanford)

Internet Hosts

Core

Num.Protein DomainStructures

large scale features per slide

transistors

Features per chip

oligo features

Large-scale: Features per Slide

65,000 genes x 106 probes =65,000,000,000 features

large scale bioinformatics is born
Large-scale:Bioinformatics is born!

(courtesy of Finn Drablos)

what is bioinformatics37
What is Bioinformatics?

(Molecular)Bio - informatics

One idea for a definition?Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) and then applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale.

Bioinformatics is a practical discipline with many applications.

Gerstein

organizing molecular biology information redundancy and multiplicity
OrganizingMolecular Biology Information:Redundancy and Multiplicity
  • Different Sequences Have the Same Structure
  • Organism has many similar genes
  • Single Gene May Have Multiple Functions
  • Genes are grouped into Pathways
  • Genomic Sequence Redundancy due to the Genetic Code
  • How do we find the similarities?....

Core

Integrative Genomics - genes  structures functionspathways expression levels  regulatory systems ….

slide41

Proteome

PubMed Hits

Core

'Omics: studying populations of molecules in a database framework

how do we organize information in an ome
How Do We Organize Information in an OME?
  • Controlled Vocabulary
  • Ranked Tree Structure
    • Biota
      • Eukarya
        • Animalia
          • Chordata
            • Mammalia
              • Primates
                • Hominidae
                  • Homo @ Homo sapiens
  • Google: Brin and Page’s (PageRank)
a parts list approach to bike maintenance44
A Parts List Approach to Bike Maintenance

How many roles can these play?

How flexible and adaptable are they mechanically?

Core

What are the shared parts (bolt, nut, washer, spring, bearing), unique parts (cogs, levers)? What are the common parts -- types of parts (nuts & washers)?

Extra

Where are the parts located?

organize encode project
Organize: ENCODE Project

The Encyclopedia of DNA Elements (ENCODE) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of the consortium is to build a comprehensive parts list of the functional elements of the human genome, including elements that act at the protein level (coding genes) and RNA level (non-coding genes), and regulatory elements that control the cells and circumstances in which a gene is active. The results of ENCODE experiments, collected in the ENCODE DCC database, are displayed on the UCSC Genome Browser.

organize parts disease genes
Organize: Parts = Disease Genes

http://www.ncbi.nlm.nih.gov/OMIM/

organizing information cross references http www ncbi nlm nih gov database
Organizing Information: Cross-referenceshttp://www.ncbi.nlm.nih.gov/Database/

Core

Homework this week OMIM

slide48

End of First Lecture 2009

Remaining Slides Lecture 2

what is bioinformatics49
What is Bioinformatics?

(Molecular)Bio - informatics

One idea for a definition?Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) and then applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale.

Bioinformatics is a practical discipline with many applications.

Gerstein

general types of informatics techniques in bioinformatics
Databases

Building, Querying

Object DB

Text String Comparison

Controlled Vocabulary

1D Alignment (Blast)

Significance Statistics

Finding Patterns

AI / Machine Learning

Statistical Models (HMM)

Clustering

Datamining

Geometry

Robotics

Graphics (Surfaces, Volumes)

Comparison and 3D Matching (Vision, recognition)

Physical Simulation

Newtonian Mechanics

Electrostatics

Numerical Algorithms

Simulation

General Types of“Informatics” techniquesin Bioinformatics
bioinformatics as new paradigm for scientific computing
Physics

Prediction based on physical principles

EX: Exact Determination of Rocket Trajectory

Emphasizes: Supercomputer, CPU

Bioinformatics as New Paradigm forScientific Computing

Core

  • Biology
    • Classifying information and discovering unexpected relationships
    • EX: Gene Expression Network
    • Emphasizes: networks, “federated” database
what is bioinformatics52
What is Bioinformatics?

(Molecular)Bio - informatics

One idea for a definition?Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) and then applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale.

Bioinformatics is a practical discipline with many applications

Gerstein

aplications genomics
Finding Genes in Genomic DNA

introns

exons

Promotors

Characterizing Repeats in Genomic DNA

Statistics

Patterns

Expression Analysis

Time Course Clustering

Identifying regulatory Regions

Measuring Differences

Aplications -- Genomics
  • Genome Comparisons
    • Ortholog Families
    • Genome annotation
    • Evolutionary Phylogenetic trees
  • Characterizing Intergenic Regions
    • Finding Pseudo genes
    • Patterns
  • Duplications in the Genome
    • Large scale genomic alignment
example application i large scale genomic alignment
Example Application I-- Large scale genomic alignment
  • Gene Myers Celera June 23, 2000
  • (Take Home Assignment for the Weekend)
  • We’re pretty sure we’ll be done in a few weeks.
  • We’d have been done a lot sooner if the public data wasn’t so crappy.
  • We think Gene can whip Francis at ping pong.
  • What’s the hurry, we don’t think they’re done either
  • Jim Kent UC Santa Cruz June 22, 2000
  • (Gigassembler)
  • The only way we could have gotten an assembly in time was if some genius came along who could do it all by himself.
  • Jim Kent was a graduate student looking for a project.
  • In four weeks Jim wrote a program to cross-reference sequence and gene annotation data, with the first run he got 80% assembly.
application protein sequence
Sequence Alignment

non-exact string matching, gaps

How to align two strings optimally via Dynamic Programming

Local vs Global Alignment

Suboptimal Alignment

Hashing to increase speed (BLAST, FASTA)

Amino acid substitution scoring matrices

Multiple Alignment and Consensus Patterns

How to align more than one sequence and then fuse the result in a consensus representation

Transitive Comparisons

HMMs, Profiles

Motifs

Scoring schemes and Matching statistics

How to tell if a given alignment or match is statistically significant

A P-value (or an e-value)?

Score Distributions(extreme val. dist.)

Low Complexity Sequences

Evolutionary Issues

Rates of mutation and change

Application -- Protein Sequence
application protein structure
Secondary Structure “Prediction”

via Propensities

Neural Networks, Genetic Algorithm.

Simple Statistics

Trans Membrane Regions

Assessing Secondary Structure Prediction

Tertiary Structure Prediction

Fold Recognition

Threading

Ab initio

Function Prediction

Active site identification

Relation of Sequence Similarity to Structural Similarity

Application-- Protein Structure
example application iii overall genome characterization

Core

Example Application III:Overall Genome Characterization
  • Overall Occurrence of a Certain Feature in the Genome
    • e.g. how many kinases in Yeast
  • Compare Organisms and Tissues
    • Expression levels in Cancerous vs Normal Tissues
  • Databases, Statistics

(Clock figures, yeast v. Synechocystis, adapted from GeneQuiz Web Page, Sander Group, EBI)

what is bioinformatics60
What is Bioinformatics?
  • (Molecular)Bio - informatics
  • One idea for a definition?Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) and then applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale.
  • Bioinformatics is a practical discipline with many applications.
quiz are they or aren t they bioinformatics 1
QuizAre They or Aren’t They Bioinformatics? (#1)
  • Digital Libraries
    • Automated Bibliographic Search of the biological literature and Textual Comparison
    • Knowledge bases for biological literature
  • Motif Discovery Using Gibb's Sampling
  • Methods for Structure Determination
    • Computational Crystallography
      • Refinement
    • NMR Structure Determination
      • Distance Geometry
  • Systems Biology: Metabolic Pathway networks
  • The DNA Computer
answers are they or aren t they bioinformatics 1
AnswersAre They or Aren’t They Bioinformatics? (#1)
  • (YES?) Digital Libraries
    • Automated Bibliographic Search and Textual Comparison
    • Knowledge bases for biological literature
  • (YES) Motif Discovery Using Gibb's Sampling
  • (NO?)Methods for Structure Determination
    • Computational Crystallography
      • Refinement
    • NMR Structure Determination
      • (YES) Distance Geometry
  • (YES) Metabolic Pathway networks
  • (NO)The DNA Computer
quiz are they or aren t they bioinformatics 2
QuizAre They or Aren’t They Bioinformatics? (#2)
  • Gene identification by sequence inspection
    • Prediction of splice sites
  • DNA methods in forensics
  • Modeling of Populations of Organisms
    • Ecological Modeling
  • Genomic Sequencing Methods
    • Assembling Contigs
    • Physical and genetic mapping
  • Linkage Analysis
    • Linking specific genes to various traits
answers are they or aren t they bioinformatics 2
AnswersAre They or Aren’t They Bioinformatics? (#2)
  • (YES) Gene identification by sequence inspection
    • Prediction of splice sites
  • (YES) DNA methods in forensics
  • (NO)Modeling of Populations of Organisms
    • Ecological Modeling
  • (NO?)Genomic Sequencing Methods
    • Assembling Contigs
    • Physical and genetic mapping
  • (YES) Linkage Analysis
    • Linking specific genes to various traits
quiz are they or aren t they bioinformatics 3
QuizAre They or Aren’t They Bioinformatics? (#3)
  • RNA structure predictionIdentification in sequences
  • Radiological Image Processing
    • Computational Representations for Human Anatomy (visible human)
  • Artificial Life Simulations
    • Artificial Immunology / Computer Security
    • Genetic Algorithms in molecular biology
  • Homology modeling
  • Determination of Phylogenies Based on Non-molecular Organism Characteristics
  • Computerized Diagnosis based on Genetic Analysis (Pedigrees)
answers are they or aren t they bioinformatics 3
AnswersAre They or Aren’t They Bioinformatics? (#3)
  • (YES) RNA structure predictionIdentification in sequences
  • (NO)Radiological Image Processing
    • Computational Representations for Human Anatomy (visible human)
  • (NO)Artificial Life Simulations
    • Artificial Immunology / Computer Security
    • (NO?) Genetic Algorithms in molecular biology
  • (YES) Homology modeling
  • (NO) Determination of Phylogenies Based on Non-molecular Organism Characteristics
  • (NO)Computerized Diagnosis based on Genetic Analysis (Pedigrees)
references
References
  • Dr. Mark Musen, StanfordBiomedicine 210: Introduction to Bioinformatics: Fundamental Methodshttp://scpd.stanford.edu/scpd/courses/academic/crseDesc.asp?crseID=142&sdID=11
  • Dr. Doug Brutlag, StanfordBiochemistry 218: Computational Molecular Biologyhttp://cmgm.stanford.edu/biochem218/
  • Dr. Mark Gerstein, YaleMB&B 452: Bioinformaticshttp://www.gersteinlab.org/courses/452/05-spr/bioinfo.html
  • Luscombe, Greenbaum, and GersteinWhat is bioinformatics? Proposed Definition and Overview of the Field; Methods InfMed. 2001,(40)346-358
  • DNA origami Paul Rothemundhttp://www.ted.com/talks/lang/eng/paul_rothemund_details_dna_folding.html