Bioinformatics in cancer biotechnology l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 58

Bioinformatics in Cancer Biotechnology PowerPoint PPT Presentation


  • 155 Views
  • Updated On :
  • Presentation posted in: General

Bioinformatics in Cancer Biotechnology. Bob Stephens Advanced Biomedical Computing Center Advanced Technology Program SAIC-Frederick, Inc. National Cancer Institute at Frederick April 19, 2007. Objectives. Overview/introduce bioinformatics concepts, applications and databases.

Download Presentation

Bioinformatics in Cancer Biotechnology

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Bioinformatics in cancer biotechnology l.jpg

Bioinformatics in Cancer Biotechnology

Bob StephensAdvanced Biomedical Computing CenterAdvanced Technology ProgramSAIC-Frederick, Inc.National Cancer Institute at Frederick

April 19, 2007


Objectives l.jpg

Objectives

  • Overview/introduce bioinformatics concepts, applications and databases.

  • Describe interplay between bioinformatics, technologies and the web.

  • Profile importance of bioinformatics in cancer research.

Cancer Biotechnology Series


What is bioinformatics l.jpg

What is bioinformatics ?

  • Bioinformatics is the application of computational methods to the analysis of any type of biological data.

  • Bioinformatics has become a diverse and multi-disciplined field that originally derived from computer science and biological science.

Cancer Biotechnology Series


Evolution of bioinformatics l.jpg

Evolution of bioinformatics

  • Rapid technological advances in sequence determination set the pace for data acquisition.

  • Similar advances in computing power and algorithmic approaches for sequence analysis, robotics enabled instruments.

  • Co-evolution with web browser and programming language technologies.

Cancer Biotechnology Series


Bioinformatics evolution contd l.jpg

Bioinformatics evolution (contd.)

  • Additional high throughput technologies becoming available almost daily - microarrays, proteomics, population and genetic data, medical literature etc.

  • Data volume is increasing at the same time as data complexity.

  • Data distribution/synchronization becoming an increasingly difficult task.

Cancer Biotechnology Series


Interplay between technology and bioinformatics l.jpg

Interplay between technology and bioinformatics

  • New HT Technologies, eg. mRNA microarray

  • Analysis and storage software

  • Computational infrastructure

  • Data integration

Cancer Biotechnology Series


Example l.jpg

Example

  • mRNA expression chip (20000 genes x 16 probes per gene), a few mb per sample.

  • Data normalization software.

  • Exon array - multiple probes for each exon for each of the 20000 genes - one file about 1gb.

  • New normalization method requires all samples to be loaded simultaneously.

  • More complex analysis reveals alternative splicing etc.

Cancer Biotechnology Series


Interface of technologies and biology l.jpg

Interface of technologies and biology

  • Experimental design very important in HT biology

  • Experiments shaped by data access and availability

  • Re-analysis of old data with new methods important

Cancer Biotechnology Series


Slide9 l.jpg

Cancer Biotechnology Series


Bioinformatics historical perspective l.jpg

Bioinformatics historical perspective

  • Stage 1 - bioinformatics term is coined to represent what had been DNA and protein sequence analysis (ca. 1995)

  • Stage 2 - additional disciplines become rolled into bioinformatics including literature mining, statistical analysis, and virtually anything to do with computational analysis of biological data. (ca. 2000)

Cancer Biotechnology Series


Bioinformatics historical perspective contd l.jpg

Bioinformatics - historical perspective (contd)

  • Realization that bioinformatics is too broad a term, other disciplines break away eg. OMICs fields (eg genomics, proteomics others (ca. 2001).

  • Still later (current) realization is made that we wont be able to make any sense of individual disciplines without integrating them together, term now changed to integrative biology or systems biology (ca. 2003).

Cancer Biotechnology Series


Importance of bioinformatics l.jpg

Importance of bioinformatics

  • Bioinformatics has become a major part of both the NCI 2015 directive and the NIH Roadmaps.

  • Virtually impossible to perform biological research without some form of computer aided analysis, especially in areas like genomics and proteomics.

  • Important to keep scientific community in touch with developing technologies and capabilities for highest return on research investment.

Cancer Biotechnology Series


Bioinformatics infrastructures l.jpg

Bioinformatics infrastructures

  • Command-line implementations.

  • Primitive GUI implementations.

  • Sophisticated GUI interfaces and application packaging.

  • Web interface and Java language gives platform independent access.

  • PC-based, web-based and server-based architectures.

  • Multiple tier infrastructures distributes computational burden.

Cancer Biotechnology Series


What does bioinformatics technology involve l.jpg

What does bioinformatics technology involve ?

  • Computer readable form of some type or types of biological data (instruments)

  • Automation also requires programmable robotics capabilities (process science).

  • Computer infrastructure for storing and analyzing the data.

  • As data volume and complexity grows, the dependency on computer analysis increases.

Cancer Biotechnology Series


Sources of bioinformatics technology l.jpg

Sources of bioinformatics technology

  • Computer science leveraged technologies including algorithms and data representation models, visualization frameworks and programming languages.

  • Web industry leveraged technologies including communication protocols, web servers and secure access.

  • Database industry derived connectivity and technologies.

  • Robotics and process engineering technologies for faster, cheaper throughput.

Cancer Biotechnology Series


What can bioinformatics technology do for biological science l.jpg

What can bioinformatics technology do for biological science ?

  • Develop uniform data standards and controlled vocabularies to allow for integration of disparate sources/types of data.

  • Connect scientists to entire wealth of knowledge from basic science results to clinical trial data in context-sensitive manner.

  • Fully integrate worldwide volume of knowledge, for example patient information disease->treatment->outcome across multiple centers to allow for cross-comparisons.

Cancer Biotechnology Series


Nci resources l.jpg

NCI Resources

  • caBIG NCICB Initiatives to develop integrated data/tool environment..

  • Long term project requiring unprecedented cooperation, sharing.

  • Short term solutions for day-to-day problems.

  • Solution - use multiple approaches, staged implementation and layered technologies

Cancer Biotechnology Series


Slide18 l.jpg

Cancer Biotechnology Series


Abcc hardware l.jpg

ABCC hardware

  • 128 cpu linux cluster (3.0 ghz processors).

  • 256 cpu linux smp box with 1Tb memory.

  • 64 cpu IRIX smp box with 256gb memory.

  • 32 cpu IBM AIX smp computers.

  • 16 cpu IBM HPC AIX smp computer.

  • 8 x 8cpu IRIX computers.

  • Other miscellaneous computers, disk storage, tape backup and network connectivity.

  • Graphics visualization wall

Cancer Biotechnology Series


Slide20 l.jpg

Cancer Biotechnology Series


Abcc organization l.jpg

ABCC Organization

  • Networking and Security

  • System administration

  • Scientific program development

  • Bioinformatics support

  • Staff ~ 40

Cancer Biotechnology Series


Abcc training programs l.jpg

ABCC Training Programs

  • Classes for NIH/NCI scientists:

    • Unix, GCG, Java, High throughput sequence analysis, Geospiza (LIMS)

    • Eudora, Advanced Eudora, Webmail

    • Homology, Docking, QSAR, Intro to Modeling, Phred, Phrap, Consed

  • One-on-one consulting services and training.

  • Organize and host vendor specific training in genomics, pathways, and modeling

Cancer Biotechnology Series


Slide23 l.jpg

ABCC Support within ATP

Proteomics and Analytical Technologies

(LPAT)

Computational Support

Database Tools/Pathways

Mass Storage and Archive

Pattern Analysis and Clustering

Molecular Technologies

(LMT)

Image Analysis

(IAL)

Computational Support

Database Tools and LIMS

Mass Storage and Archive

Bioinformatics/Web

Pattern/SNP Analysis

ABCC

Algorithm and Software

Image Database

Mass Storage and Archive

Viz Technology Development

Gene Expression

(GEL)

Protein Chemistry (PCL)

Software Support

Gene Assembly and Validation

Protein Expression

(PEL)

Animal Sciences (LASP)

Mass Storage

Database

POET/Web

Cancer Biotechnology Series


Abcc applications l.jpg

ABCC applications

  • Sequence analysis - protein and nucleic acid, GCG and EMBOSS.

  • Sequence assembly, SNP detection.

  • Gene finders, analysis tools.

  • Molecular modeling, docking.

  • Molecular evolution and phylogeny.

  • Computational chemistry.

  • Linkage analysis.

  • Proteomics.

  • Classification tools (microarray and proteomics).

Cancer Biotechnology Series


Abcc databases l.jpg

ABCC databases

  • Genbank and derived divisions.

  • Refseq, WGS, unigene divisions.

  • dbSNP, gene, OMIM, homologene.

  • UCSC, EBI and ncbi genome datasets.

  • LIMS systems, data management.

  • Uniprot, PDB, PIR, iProClass, Swissprot.

  • CGAP, MGC data files, pathways.

  • Medline, transfac and repeats data files.

Cancer Biotechnology Series


Abcc web resources l.jpg

ABCC web resources

  • ABCC General information web page http://www.abcc.ncifcrf.gov

  • ABCC account application information http://www.abcc.ncifcrf.gov/apps_apply.shtml

  • ABCC Training web page http://www.abcc.ncifcrf.gov/training/courses.shtml

  • ABCC scientific applications webpage http://www.abcc.ncifcrf.gov/app/htdocs/appdb/index.php

  • ABCC GRID Database web page http://grid.abcc/ncifcrf.gov

  • ABCC Pipelines web page http://www.abcc.ncifcrf.gov/app/login/login.php

Cancer Biotechnology Series


The role of bioinformatics in cancer research l.jpg

The role of bioinformatics in cancer research

  • Diagnosis - identify classifiers to better sub-divide cancer etiologies into groups. Better individual data to put treatment and individual together.

  • Treatment - identify better methods to track treatment progress and indicate problems earlier.

  • Prevention - understand mechanisms for cancer initiation, progression and development and identify targets in this process.

  • Connect cancer patient data from geographically distributed cancer patients for more complete analysis.

Cancer Biotechnology Series


Protein analysis tools l.jpg

Protein analysis tools

  • Protein composition, isoelectric point, molecular weight analysis tools.

  • Comparable alignment/searching tools for proteins.

  • Protein secondary structure prediction tools.

  • Protein structure modeling tools.

Cancer Biotechnology Series


Genomics tools l.jpg

Genomics tools

  • Gene finder and general genome annotation tools.

  • Cross genome comparison tools and databases.

  • Large scale sequence assembly and polymorphism identification tools.

  • Genomic visualization tools (UCSC, NCBI, Ensembl).

  • Data cleansing tools - vector screening, repeat masking.

Cancer Biotechnology Series


Gene expression tools l.jpg

Gene expression tools

  • EST Clustering and differential expression analysis tools and databases.

  • SAGE Analysis tools and databases.

  • Microarray data collection, calibration and analysis tools and databases.

  • Gene clustering and visualization tools.

  • Integration tools - pathways, regulatory networks and medical literature.

  • Databases for housing and querying the data.

Cancer Biotechnology Series


Proteomics tools l.jpg

Proteomics tools

  • Mass spectroscopy tools for peptide identification.

  • Fragment classification tools for identification of diagnostics

  • Peptide fragment resolution tools - identification of protein mixtures from peptide sets.

  • Databases for storing and querying the data.

Cancer Biotechnology Series


Inherent bioinformatics problems l.jpg

Inherent bioinformatics problems

  • Keeping data sources synchronized and up to date.

  • Keeping applications up to date.

  • Remaining aware of current palette of available tools and resources.

  • Separation between computer developers and biologist users of software and databases.

  • The silo concept- separate dysfunctional units.

  • Lack of common language or database schema.

Cancer Biotechnology Series


Data analysis l.jpg

Data Analysis

  • Pathway analysis

  • Polymorphism

  • Proteomics

  • Image analysis

  • Homology Modeling

  • Live polymorphism analysis (if time permits)

Cancer Biotechnology Series


Pathway analysis l.jpg

Pathway Analysis

  • Identify specific requirements of individual tumor.

  • Advance to detection from diagnosis.

  • Multiple points to cause aberrations and multiple points to act to correct them.

  • Identify/characterize tissue, cell specific targets.

Cancer Biotechnology Series


Pathway gene set analysis l.jpg

Pathway Gene Set Analysis

  • Many experiments result in sets of genes, eg microarray, proteomics, literature searches etc.

  • Clustering genes based on expression etc. provides only first dimension.

  • View prospective pathways impacted by changes in expression, protein levels, phosphorylation etc.

Cancer Biotechnology Series


Slide36 l.jpg

G5G8Tg1Liver

G5G8Tg2Liver

G5G8-/-1Liver

G5G8-/-2Liver

G5G8-/-3Liver


Slide37 l.jpg

G5G8Tg1Liver

G5G8Tg2Liver

G5G8-/-1Liver

G5G8-/-2Liver

G5G8-/-3Liver


Integrative strategy for microarray analysis l.jpg

Integrative Strategy for Microarray Analysis

Microarray Data

Clustering

Analysis

Load into

WPS

WSCP

Unassigned

Genes

Integrate with

WPS

Lists of Genes

Assign to

uncharacterized

pathway(s)

Assign to

known

pathway(s)

Putative

Pathway

PSCP

PSCP

PSCP


Project goal integrate biological data and or information databases into biological networks l.jpg

Project Goal: Integrate Biological Data and/or Information Databases into Biological Networks

User input:

Microarray

Data, Proteomics

Protein Interaction

Database (BIND, DIP etc.)

Comparative

Genomics

P1

P2

Protein

Modification

Phos., Glyco.

Gene regulation

(Promoter etc)

Gene Ontology

SNP &

Haplotype

Database

(SNPinfo etc)

Literature DB

(e.g. Pubgene

ResNet)

NCBI resources

OMIM etc

……

Statistical Evaluation

Network Expansion

(high, low confidence)


Slide40 l.jpg

One example of analysis scenario

microarray data pathway analysis

or clustering in local PC

Candidate gene sets

Candidate pathway sets

Pre-computed DBs or

Run-time computed

Internet-enabled

SNP & Haplotype data

(SNPinfo; Disease association

Promoter

Comparsion

1.CGI generator

2.CoreSearch

3 ConsInspector)

Protein

interaction

Literature-based

(Pubgene etc

NCBI OMIM etc)

GO

Known gene

training

Weighted scoring (Statistic analysis, filtering)

Final set of candidate genes

(visualization and re-creation

of the new subnetwork within

the whole network)

Pathway expansion


Polymorphism impacts l.jpg

Polymorphism Impacts

  • Variation within species as great as differences between closely related species

  • Confounds correlation analysis

  • Impacts gene structure and expression

  • Start with complete sequence for individual, obtain polymorphism data for populations/strains and breeds etc.

  • Strains/breeds allow for good start

Cancer Biotechnology Series


Polymorphism types l.jpg

Polymorphism Types

  • SNPs

  • Indels

    • STRs

    • Tandem

    • NonTandem (Copy number variation)

    • Retroelement

  • Complex

  • Inversion/translocation

Cancer Biotechnology Series


Slide43 l.jpg

STR Polymorphism View

Cancer Biotechnology Series


Slide44 l.jpg

Strain Trace and Contig Coverage View

Cancer Biotechnology Series


Slide45 l.jpg

InDel Polymorphism Information View

Cancer Biotechnology Series


Slide46 l.jpg

Location Polymorphism Locator Query

Cancer Biotechnology Series


Slide47 l.jpg

STR Query results

Cancer Biotechnology Series


Slide48 l.jpg

Polymorphism Visualization

Cancer Biotechnology Series


Proteomics initiative abcc projects l.jpg

Proteomics InitiativeABCC Projects

  • Disk Storage and Archiving (centralized storage)

  • LAN Support

  • Software Development

    • Spectral Filtering

    • Clustering/Biomarker Identification

  • Database Development and Update

    • Peptide identification DB

  • MS Integration with Pathways

    • ABCC Pathway tool

  • Provide Scalable Computational Resources

  • Software Optimization

    • Sequest (working with LPAT,Yates Lab, and Thermoelectron)

Cancer Biotechnology Series


Slide50 l.jpg

Raw Data

Binning

Biological Marker

Clustering

Cancer Biotechnology Series


Slide51 l.jpg

Need for effective classification schemes for correlating large amounts of data with Cancer markers

  • Large amounts of data.

  • Many features (data points) to fit but few samples

  • Problems are over-determined

    • Solutions may be purely mathematical with no biological basis

Cancer Biotechnology Series


Image processing l.jpg

Image Processing

  • Confocal Microscopy &Whole Animal Imaging

    • 3D Segmentation

  • Traditional/Real-time Microscopy

    • Automated Quantitative Feature Analysis

Cancer Biotechnology Series


Confocal imaging l.jpg

Confocal Imaging

  • Confocal Microscopy captures 3D volumes of tissue in situ

    • Cancer appearance / development is related to the cellular neighborhood

    • Therefore, segmentation and interpretation of cellular clusters is required

    • NCI Developed Algorithms

    • Segmentation needs human review

Cancer Biotechnology Series


Slide54 l.jpg

Imaging - SGI

Imaging/Confocal Microscopy

Cancer Biotechnology Series


Homology modeling l.jpg

Homology Modeling

  • Many new chemotheraputic molecules are specific enzyme inhibitors

  • Structural biology plays key role in design/enhancement of these compounds.

  • Identify better inhibitors, understand specific differences and mechanisms.

Cancer Biotechnology Series


Slide56 l.jpg

Homology Modeling of Cysteing Finger in 3 Human Raf Proteins

Cancer Biotechnology Series


Abcc bioinformatics support group l.jpg

Anney Che

Jack Chen

Jin Chen

Qingrong Chen

David Liu

Uma Mudunuri

Jigui Shan

Wei Shao

Gary Smythers

Hong Mei Sun

Natalia Volfovsky

Xinyu Wen

Ming Yi

Jack Zhu

ABCC Bioinformatics Support Group

Cancer Biotechnology Series


Bob stephens bobs@ncifcrf gov www abc ncifcrf gov l.jpg

Bob [email protected]

Query tool

GBrowse

Cancer Biotechnology Series


  • Login