omics data integration mining l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Omics data integration & mining PowerPoint Presentation
Download Presentation
Omics data integration & mining

Loading in 2 Seconds...

play fullscreen
1 / 50

Omics data integration & mining - PowerPoint PPT Presentation


  • 518 Views
  • Uploaded on

BK21 BT · IT Integrationist Program Omics data integration & mining The Sixth Sino-Japan-Korea Bioinformatics Training Course Shanghai, Ma rch 27-30, 200 7 2007. 3. 29 Sangsoo Kim & KOBIC Omics Team What is the goal of Biosciences? Ultimately, the complete understanding of life phenomena

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Omics data integration & mining' - bernad


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
omics data integration mining

BK21 BT·IT Integrationist Program

Omics data integration & mining

The Sixth Sino-Japan-Korea Bioinformatics Training Course

Shanghai, March 27-30, 2007

2007. 3. 29

Sangsoo Kim &

KOBIC Omics Team

what is the goal of biosciences
What is the goal of Biosciences?
  • Ultimately, the complete understanding of life phenomena
    • Complex organization
    • Regulatory mechanism (homeostasis)
    • Growth & development
    • Energy utilization
    • Response to the environmental stimuli
    • Reproduction (DNA guaranties exact replication)
    • Evolution (capacity of species to change over time)
spider silk stronger than steel
Spider Silk: Stronger than Steel
  • Life’s diversity results from the variety of molecules in cells
  • A spider’s web-building skill depends on its DNA molecules
  • DNA also determines the structure of silk proteins
    • These make a spiderweb strong and resilient
slide4
The capture strand contains a single coiled silk fiber coated with a sticky fluid
  • The coiled fiber unwinds to capture prey and then recoils rapidly

Coiled fiberof silk protein

Coating of capture strand

slide5

Evidence from flagelliform silk cDNA for the structural basis of elasticity and modular nature of spider silks J Mol Biol. 1998 Feb 6;275(5):773-84

  • They report the cloning of substantial cDNA for flagelliform gland silk protein, which forms the core fiber of the catching spiral
  • The dominant repeat of this protein is Gly-Pro-Gly-Gly-X, which can appear up to 63 times in tandem arrays
  • They propose that the spring-like helix is the basis for the elasticity of silk
paradigm shift in biosciences
Paradigm Shift in Biosciences
  • So far, biologists have focused certain phenotypes and hunted the genes responsible, one at a time
  • New trend is
    • Catalog all the parts: genes and proteins
    • Understand how each part works
    • Model & simulate the collective behavior of the parts

Genomics & Proteomics

FunctionalGenomics

Systems

Biology

slide8

genome

transcriptome

proteome

Central dogma of bioinformatics and genomics

Central dogma of molecular biology

DNA

RNA

protein

slide9

Base pairs of DNA (billions)

Sequences (millions)

1982

1986

1990

1994

1998

2002

Year

slide10
With $1,000 genome sequencing technologies in 10 years coupled with functional data, we need better IT solutions!
proliferation of genomics
Proliferation of Genomics
  • Explosion of data
    • Human genes: 25,000
    • Human genome: 3x109 bp
    • DNA-protein or protein-protein interactions could increase data dramatically
  • Chimpanzee, mouse, rat, dog, cow, chicken, insects, worms, plants, fungi, algae, bacteria, archaea, viruses …
genome projects 385 finished as of june 4 2006
Genome Projects (385 finished)as of June 4, 2006

Ongoing projects

608 eukaryotes

989 prokaryotes

slide13

Top ten challenges for bioinformatics

[1] Precise models of where and when transcription

will occur in a genome (initiation and termination)

[2] Precise, predictive models of alternative RNA splicing

[3] Precise models of signal transduction pathways;

ability to predict cellular responses to external stimuli

[4] Determining protein:DNA, protein:RNA, protein:protein

recognition codes

[5] Accurate ab initio protein structure prediction

slide14

Top ten challenges for bioinformatics

[6] Rational design of small molecule inhibitors of proteins

[7] Mechanistic understanding of protein evolution

[8] Mechanistic understanding of speciation

[9] Development of effective gene ontologies:

systematic ways to describe gene and protein function

[10] Education: development of bioinformatics curricula

Source: Ewan Birney,

Chris Burge, Jim Fickett

functional genomics systems biology
Functional Genomics & Systems Biology
  • New data types:
    • Sequences
    • Structures
    • High throughput expression profiles in (10,000 x 100) matrix forms
    • Interactions, Pathways, Networks
  • Mathematical modeling & simulation of biological processes
    • Algorithms
    • Graphical visualization
slide16

K-JIST

18C

19C

20C

slide17

Genome

Transcriptome

Proteome

Metabolome

Genomics

Transcriptomics

Proteomics

Metabolomics

DNA

RNA

Protein

Metabolite

K-JIST

Terminology

More than 50-omes including “Unknownome”

omics data
Omics data
  • In the Omics era, we see proliferation of genome/proteome-wide high throughput data that are available in public archives
    • Comparative genome sequences
    • Sequence variation & phenotypes
    • Epigenetics & chromatin structure
    • Regulatory elements & gene expression
    • Protein expression, modification & localization
    • Protein domain, structure, interaction
    • Metabolic, signal, regulatory pathways
    • Drug, toxicogenomics, toxicoproteomics
slide20

Joyce et al.Nature Reviews Molecular Cell Biology7, 198–210 (March 2006) | doi:10.1038/nrm1857

slide21

Joyce et al.Nature Reviews Molecular Cell Biology7, 198–210 (March 2006) | doi:10.1038/nrm1857

slide22

Joyce et al.Nature Reviews Molecular Cell Biology7, 198–210 (March 2006) | doi:10.1038/nrm1857

slide23

Joyce et al.Nature Reviews Molecular Cell Biology7, 198–210 (March 2006) | doi:10.1038/nrm1857

slide24

Joyce et al.Nature Reviews Molecular Cell Biology7, 198–210 (March 2006) | doi:10.1038/nrm1857

as an example
As an example,
  • Suppose you are interested in how much the CDK2 trascription control is conserved, you may need
    • Orthologs in various model organisms
    • Genome alignments of promoter regions among phylogenetic cousins
      • Among mammalians or vertebrates
      • Among yeast subsepecies
    • Transfac-type of TF binding database
    • ChIP-chip data for each organism
    • Orthology map of the TF’s and so on
    • You may add proteome and interactome
  • Only part of them are available at NCBI
  • Rest of them are available in the public domain as an supplementary materials or at the author’s web sites
integration of omics data
Integration of Omics data
  • Systematic mining
  • Cross-knowledge domain validation
  • Cross-species interpolation
  • Generation of hypotheses that can be tested
  • Biologically very interesting queries
  • Requires cross-functional knowledge
  • The way to go
where to look for
Where to look for
  • Nature provides omics section
    • www.nature.com/omics
  • Science
  • Cell
  • PLoS Biology
  • Genes & Development
  • Stem Cell
  • Relevant articles (PubMed, Google Scholar)
phase 1 of encode
Phase 1 of ENCODE
  • NHGRI’s ENCODE project generates such data at a pilot scale
  • The data are deposited and integrated into the UCSC Genome Browser
    • It offers data mining capability via Table Browser
    • There is no ‘biological links’ among the 3,000+ tables (Ensembl’s BioMart is more ‘biological’)
    • It is upto the users how to combine the tables
    • It is limited to genomic coordinates, not intended for proteome work
application examples
Application Examples

Joyce et al.Nature Reviews Molecular Cell Biology7, 198–210 (March 2006) | doi:10.1038/nrm1857

protein dna interaction transcriptomics
Protein-DNA Interaction & Transcriptomics
  • Yeast rich medium gene modules network
  • ChIP-chip location and expression data
  • 106 modules containing 655 genes regulated by 68 TFs
how to participate
How to participate
  • Domain knowledge group
    • Monitoring papers and websites of relevant data
    • Collect the omics data and transform into common formats
    • Develop hypotheses & mining strategies
  • Data integration group
    • Develop DB schema
    • Integration with bio-matrix & bio-engine
    • Querying biological concepts
    • Graphic visualization
practice session cytoscape
Practice Session - Cytoscape
  • Installation
    • One of the most widely used and broadly accessible software packages designed to facilitate omics data integration and analysis
  • Totorials
    • Interaction network display
    • Expression analysis
    • Literature searching