data provenance workshop n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Data Provenance Workshop PowerPoint Presentation
Download Presentation
Data Provenance Workshop

Loading in 2 Seconds...

play fullscreen
1 / 11

Data Provenance Workshop - PowerPoint PPT Presentation


  • 52 Views
  • Uploaded on

Data Provenance Workshop. Natalia Maltsev MCS Argonne National Laboratory. 98 published genomes 652 on-going genomes. So much data!!. Hmmm …. Why Biotechnological Revolution?. High-throughput technologies provide huge amounts of biological data: Sequence data

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data Provenance Workshop' - mayten


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data provenance workshop

Data Provenance Workshop

Natalia Maltsev

MCS

Argonne National Laboratory

why biotechnological revolution
98 published genomes

652 on-going genomes

So much

data!!

Hmmm…

Why Biotechnological Revolution?
  • High-throughput technologies provide huge amounts of biological data:
    • Sequence data
    • Data describing functional Networks (Metabolism, Regulation, Gene Expression)
    • Dynamic data
  • Progress of Computer Science and Computer Technologies and Bioinformatics allows to analyze this data
slide3

Genomes

Gene Products

Structure & Function

Pathways & Physiology

Biology in a Nutshell(for people with little knowledge but infinite intelligence)

  • Genome (ROM): assembly code on how to build proteins
  • Instructions: A, C, T, G
  • 3 variables  amino acid
  • Genome consists of genes
  • Gene  Protein: Object description Object instantiation
  • Protein Functions
  • Enzymes: proteins that catalyze biochemical reactions
  • Pathway: sequence of reactions
  • Network(directed graph): set of pathways with metabolites as vertices and enzymes as edges
data classes
Data: Classes
  • Sequence data
    • DNA sequences, Protein Sequences– NCBI, GenBank, SwissProt, TIGR, sequencing projects
  • Data describing Networks
    • Metabolic Networks (EMP database, KEGG, etc)
    • Regulatory Networks (Sentra, TransFac, etc)
    • Gene Expression data (Experimental)
    • Other experimental data
  • Dynamic Data (experimental and literature)
  • Organisms data
    • Phenotypic data
    • Physiological data
general systems biology project architecture
General Systems Biology Project Architecture
  • Stages of analysis:
  • Determine components of the system (assign functions to the genes)
  • Establish relationships between components – reconstruct biological networks (develop a static model)
  • Develop a dynamic model of the system
data sources
Data Sources
  • Public and private databases (GeneBank, SwissProt, EMP, KEGG, etc)
  • Results of data analysis
  • Updates and versioning? (Data and annotations updates, Developed models)
prediction of gene functions
Prediction of Gene functions
  • Predicting of gene functions by comparing of an unknown sequence with sequences of genes for which the functions are established

Seq1 – function alcohol dehydrogenase

Seq2– Function?

Alcohol dehydrogenase?

Seq1_Mus.musculus GSGITKGLGAGANPEVGRNAADEDRDALRAALEGSDMVFIAAGMGGGTGTGAAPVVAE

Seq2_Homo_sapiens GSGITKGLGAGANPEVGRNS AEEDRDALRAALDGSDMVFIAAGMGGGTGTGAAPVVAE

example 1 gene function assignments
Example 1 Gene Function Assignments

Query sequence

Function Unknown!!!

KNOWLEDGE BASE

Bioinformatics

tools

Blast

InterPro

Blocks

F2

F1

result

result

result

F3

VOTING

ALGORITHM

F2

F1 with probability P1

F2 with probability P2

an example on pathways reconstruction
An Example on Pathways Reconstruction

How reliably can we

predict this pathway?

What approach will

Increase our confidence

The most?

another problem control of data flow
Another Problem:Control of Data flow

Data Acquisition

How reliable?

Data Analysis

How reliable?

Data Storage

How reliable?

general systems biology project architecture1
General Systems Biology Project Architecture
  • What can provenance do?
  • Help plan experiments by uggesting “weak” facts to be tested in a wetlab
  • Find “weak” spots in a model
  • Prioritize certain steps of model building
  • Evaluate data flows