bioinformatics in cancer biotechnology l.
Skip this Video
Loading SlideShow in 5 Seconds..
Bioinformatics in Cancer Biotechnology PowerPoint Presentation
Download Presentation
Bioinformatics in Cancer Biotechnology

Loading in 2 Seconds...

play fullscreen
1 / 58

Bioinformatics in Cancer Biotechnology - PowerPoint PPT Presentation

  • Uploaded on

Bioinformatics in Cancer Biotechnology. Bob Stephens Advanced Biomedical Computing Center Advanced Technology Program SAIC-Frederick, Inc. National Cancer Institute at Frederick April 19, 2007. Objectives. Overview/introduce bioinformatics concepts, applications and databases.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Bioinformatics in Cancer Biotechnology' - sherlock_clovis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
bioinformatics in cancer biotechnology

Bioinformatics in Cancer Biotechnology

Bob StephensAdvanced Biomedical Computing CenterAdvanced Technology ProgramSAIC-Frederick, Inc.National Cancer Institute at Frederick

April 19, 2007

  • Overview/introduce bioinformatics concepts, applications and databases.
  • Describe interplay between bioinformatics, technologies and the web.
  • Profile importance of bioinformatics in cancer research.

Cancer Biotechnology Series

what is bioinformatics
What is bioinformatics ?
  • Bioinformatics is the application of computational methods to the analysis of any type of biological data.
  • Bioinformatics has become a diverse and multi-disciplined field that originally derived from computer science and biological science.

Cancer Biotechnology Series

evolution of bioinformatics
Evolution of bioinformatics
  • Rapid technological advances in sequence determination set the pace for data acquisition.
  • Similar advances in computing power and algorithmic approaches for sequence analysis, robotics enabled instruments.
  • Co-evolution with web browser and programming language technologies.

Cancer Biotechnology Series

bioinformatics evolution contd
Bioinformatics evolution (contd.)
  • Additional high throughput technologies becoming available almost daily - microarrays, proteomics, population and genetic data, medical literature etc.
  • Data volume is increasing at the same time as data complexity.
  • Data distribution/synchronization becoming an increasingly difficult task.

Cancer Biotechnology Series

interplay between technology and bioinformatics
Interplay between technology and bioinformatics
  • New HT Technologies, eg. mRNA microarray
  • Analysis and storage software
  • Computational infrastructure
  • Data integration

Cancer Biotechnology Series

  • mRNA expression chip (20000 genes x 16 probes per gene), a few mb per sample.
  • Data normalization software.
  • Exon array - multiple probes for each exon for each of the 20000 genes - one file about 1gb.
  • New normalization method requires all samples to be loaded simultaneously.
  • More complex analysis reveals alternative splicing etc.

Cancer Biotechnology Series

interface of technologies and biology
Interface of technologies and biology
  • Experimental design very important in HT biology
  • Experiments shaped by data access and availability
  • Re-analysis of old data with new methods important

Cancer Biotechnology Series

bioinformatics historical perspective
Bioinformatics historical perspective
  • Stage 1 - bioinformatics term is coined to represent what had been DNA and protein sequence analysis (ca. 1995)
  • Stage 2 - additional disciplines become rolled into bioinformatics including literature mining, statistical analysis, and virtually anything to do with computational analysis of biological data. (ca. 2000)

Cancer Biotechnology Series

bioinformatics historical perspective contd
Bioinformatics - historical perspective (contd)
  • Realization that bioinformatics is too broad a term, other disciplines break away eg. OMICs fields (eg genomics, proteomics others (ca. 2001).
  • Still later (current) realization is made that we wont be able to make any sense of individual disciplines without integrating them together, term now changed to integrative biology or systems biology (ca. 2003).

Cancer Biotechnology Series

importance of bioinformatics
Importance of bioinformatics
  • Bioinformatics has become a major part of both the NCI 2015 directive and the NIH Roadmaps.
  • Virtually impossible to perform biological research without some form of computer aided analysis, especially in areas like genomics and proteomics.
  • Important to keep scientific community in touch with developing technologies and capabilities for highest return on research investment.

Cancer Biotechnology Series

bioinformatics infrastructures
Bioinformatics infrastructures
  • Command-line implementations.
  • Primitive GUI implementations.
  • Sophisticated GUI interfaces and application packaging.
  • Web interface and Java language gives platform independent access.
  • PC-based, web-based and server-based architectures.
  • Multiple tier infrastructures distributes computational burden.

Cancer Biotechnology Series

what does bioinformatics technology involve
What does bioinformatics technology involve ?
  • Computer readable form of some type or types of biological data (instruments)
  • Automation also requires programmable robotics capabilities (process science).
  • Computer infrastructure for storing and analyzing the data.
  • As data volume and complexity grows, the dependency on computer analysis increases.

Cancer Biotechnology Series

sources of bioinformatics technology
Sources of bioinformatics technology
  • Computer science leveraged technologies including algorithms and data representation models, visualization frameworks and programming languages.
  • Web industry leveraged technologies including communication protocols, web servers and secure access.
  • Database industry derived connectivity and technologies.
  • Robotics and process engineering technologies for faster, cheaper throughput.

Cancer Biotechnology Series

what can bioinformatics technology do for biological science
What can bioinformatics technology do for biological science ?
  • Develop uniform data standards and controlled vocabularies to allow for integration of disparate sources/types of data.
  • Connect scientists to entire wealth of knowledge from basic science results to clinical trial data in context-sensitive manner.
  • Fully integrate worldwide volume of knowledge, for example patient information disease->treatment->outcome across multiple centers to allow for cross-comparisons.

Cancer Biotechnology Series

nci resources
NCI Resources
  • caBIG NCICB Initiatives to develop integrated data/tool environment..
  • Long term project requiring unprecedented cooperation, sharing.
  • Short term solutions for day-to-day problems.
  • Solution - use multiple approaches, staged implementation and layered technologies

Cancer Biotechnology Series

abcc hardware
ABCC hardware
  • 128 cpu linux cluster (3.0 ghz processors).
  • 256 cpu linux smp box with 1Tb memory.
  • 64 cpu IRIX smp box with 256gb memory.
  • 32 cpu IBM AIX smp computers.
  • 16 cpu IBM HPC AIX smp computer.
  • 8 x 8cpu IRIX computers.
  • Other miscellaneous computers, disk storage, tape backup and network connectivity.
  • Graphics visualization wall

Cancer Biotechnology Series

abcc organization
ABCC Organization
  • Networking and Security
  • System administration
  • Scientific program development
  • Bioinformatics support
  • Staff ~ 40

Cancer Biotechnology Series

abcc training programs
ABCC Training Programs
  • Classes for NIH/NCI scientists:
    • Unix, GCG, Java, High throughput sequence analysis, Geospiza (LIMS)
    • Eudora, Advanced Eudora, Webmail
    • Homology, Docking, QSAR, Intro to Modeling, Phred, Phrap, Consed
  • One-on-one consulting services and training.
  • Organize and host vendor specific training in genomics, pathways, and modeling

Cancer Biotechnology Series


ABCC Support within ATP

Proteomics and Analytical Technologies


Computational Support

Database Tools/Pathways

Mass Storage and Archive

Pattern Analysis and Clustering

Molecular Technologies


Image Analysis


Computational Support

Database Tools and LIMS

Mass Storage and Archive


Pattern/SNP Analysis


Algorithm and Software

Image Database

Mass Storage and Archive

Viz Technology Development

Gene Expression


Protein Chemistry (PCL)

Software Support

Gene Assembly and Validation

Protein Expression


Animal Sciences (LASP)

Mass Storage



Cancer Biotechnology Series

abcc applications
ABCC applications
  • Sequence analysis - protein and nucleic acid, GCG and EMBOSS.
  • Sequence assembly, SNP detection.
  • Gene finders, analysis tools.
  • Molecular modeling, docking.
  • Molecular evolution and phylogeny.
  • Computational chemistry.
  • Linkage analysis.
  • Proteomics.
  • Classification tools (microarray and proteomics).

Cancer Biotechnology Series

abcc databases
ABCC databases
  • Genbank and derived divisions.
  • Refseq, WGS, unigene divisions.
  • dbSNP, gene, OMIM, homologene.
  • UCSC, EBI and ncbi genome datasets.
  • LIMS systems, data management.
  • Uniprot, PDB, PIR, iProClass, Swissprot.
  • CGAP, MGC data files, pathways.
  • Medline, transfac and repeats data files.

Cancer Biotechnology Series

abcc web resources
ABCC web resources
  • ABCC General information web page
  • ABCC account application information
  • ABCC Training web page
  • ABCC scientific applications webpage
  • ABCC GRID Database web page http://grid.abcc/
  • ABCC Pipelines web page

Cancer Biotechnology Series

the role of bioinformatics in cancer research
The role of bioinformatics in cancer research
  • Diagnosis - identify classifiers to better sub-divide cancer etiologies into groups. Better individual data to put treatment and individual together.
  • Treatment - identify better methods to track treatment progress and indicate problems earlier.
  • Prevention - understand mechanisms for cancer initiation, progression and development and identify targets in this process.
  • Connect cancer patient data from geographically distributed cancer patients for more complete analysis.

Cancer Biotechnology Series

protein analysis tools
Protein analysis tools
  • Protein composition, isoelectric point, molecular weight analysis tools.
  • Comparable alignment/searching tools for proteins.
  • Protein secondary structure prediction tools.
  • Protein structure modeling tools.

Cancer Biotechnology Series

genomics tools
Genomics tools
  • Gene finder and general genome annotation tools.
  • Cross genome comparison tools and databases.
  • Large scale sequence assembly and polymorphism identification tools.
  • Genomic visualization tools (UCSC, NCBI, Ensembl).
  • Data cleansing tools - vector screening, repeat masking.

Cancer Biotechnology Series

gene expression tools
Gene expression tools
  • EST Clustering and differential expression analysis tools and databases.
  • SAGE Analysis tools and databases.
  • Microarray data collection, calibration and analysis tools and databases.
  • Gene clustering and visualization tools.
  • Integration tools - pathways, regulatory networks and medical literature.
  • Databases for housing and querying the data.

Cancer Biotechnology Series

proteomics tools
Proteomics tools
  • Mass spectroscopy tools for peptide identification.
  • Fragment classification tools for identification of diagnostics
  • Peptide fragment resolution tools - identification of protein mixtures from peptide sets.
  • Databases for storing and querying the data.

Cancer Biotechnology Series

inherent bioinformatics problems
Inherent bioinformatics problems
  • Keeping data sources synchronized and up to date.
  • Keeping applications up to date.
  • Remaining aware of current palette of available tools and resources.
  • Separation between computer developers and biologist users of software and databases.
  • The silo concept- separate dysfunctional units.
  • Lack of common language or database schema.

Cancer Biotechnology Series

data analysis
Data Analysis
  • Pathway analysis
  • Polymorphism
  • Proteomics
  • Image analysis
  • Homology Modeling
  • Live polymorphism analysis (if time permits)

Cancer Biotechnology Series

pathway analysis
Pathway Analysis
  • Identify specific requirements of individual tumor.
  • Advance to detection from diagnosis.
  • Multiple points to cause aberrations and multiple points to act to correct them.
  • Identify/characterize tissue, cell specific targets.

Cancer Biotechnology Series

pathway gene set analysis
Pathway Gene Set Analysis
  • Many experiments result in sets of genes, eg microarray, proteomics, literature searches etc.
  • Clustering genes based on expression etc. provides only first dimension.
  • View prospective pathways impacted by changes in expression, protein levels, phosphorylation etc.

Cancer Biotechnology Series













integrative strategy for microarray analysis
Integrative Strategy for Microarray Analysis

Microarray Data



Load into





Integrate with


Lists of Genes

Assign to



Assign to








project goal integrate biological data and or information databases into biological networks
Project Goal: Integrate Biological Data and/or Information Databases into Biological Networks

User input:


Data, Proteomics

Protein Interaction

Database (BIND, DIP etc.)







Phos., Glyco.

Gene regulation

(Promoter etc)

Gene Ontology




(SNPinfo etc)

Literature DB

(e.g. Pubgene


NCBI resources

OMIM etc


Statistical Evaluation

Network Expansion

(high, low confidence)


One example of analysis scenario

microarray data pathway analysis

or clustering in local PC

Candidate gene sets

Candidate pathway sets

Pre-computed DBs or

Run-time computed


SNP & Haplotype data

(SNPinfo; Disease association



1.CGI generator


3 ConsInspector)




(Pubgene etc



Known gene


Weighted scoring (Statistic analysis, filtering)

Final set of candidate genes

(visualization and re-creation

of the new subnetwork within

the whole network)

Pathway expansion

polymorphism impacts
Polymorphism Impacts
  • Variation within species as great as differences between closely related species
  • Confounds correlation analysis
  • Impacts gene structure and expression
  • Start with complete sequence for individual, obtain polymorphism data for populations/strains and breeds etc.
  • Strains/breeds allow for good start

Cancer Biotechnology Series

polymorphism types
Polymorphism Types
  • SNPs
  • Indels
    • STRs
    • Tandem
    • NonTandem (Copy number variation)
    • Retroelement
  • Complex
  • Inversion/translocation

Cancer Biotechnology Series


STR Polymorphism View

Cancer Biotechnology Series


Strain Trace and Contig Coverage View

Cancer Biotechnology Series


InDel Polymorphism Information View

Cancer Biotechnology Series


Location Polymorphism Locator Query

Cancer Biotechnology Series


STR Query results

Cancer Biotechnology Series


Polymorphism Visualization

Cancer Biotechnology Series

proteomics initiative abcc projects
Proteomics InitiativeABCC Projects
  • Disk Storage and Archiving (centralized storage)
  • LAN Support
  • Software Development
    • Spectral Filtering
    • Clustering/Biomarker Identification
  • Database Development and Update
    • Peptide identification DB
  • MS Integration with Pathways
    • ABCC Pathway tool
  • Provide Scalable Computational Resources
  • Software Optimization
    • Sequest (working with LPAT,Yates Lab, and Thermoelectron)

Cancer Biotechnology Series


Raw Data


Biological Marker


Cancer Biotechnology Series


Need for effective classification schemes for correlating large amounts of data with Cancer markers

  • Large amounts of data.
  • Many features (data points) to fit but few samples
  • Problems are over-determined
    • Solutions may be purely mathematical with no biological basis

Cancer Biotechnology Series

image processing
Image Processing
  • Confocal Microscopy &Whole Animal Imaging
    • 3D Segmentation
  • Traditional/Real-time Microscopy
    • Automated Quantitative Feature Analysis

Cancer Biotechnology Series

confocal imaging
Confocal Imaging
  • Confocal Microscopy captures 3D volumes of tissue in situ
    • Cancer appearance / development is related to the cellular neighborhood
    • Therefore, segmentation and interpretation of cellular clusters is required
    • NCI Developed Algorithms
    • Segmentation needs human review

Cancer Biotechnology Series


Imaging - SGI

Imaging/Confocal Microscopy

Cancer Biotechnology Series

homology modeling
Homology Modeling
  • Many new chemotheraputic molecules are specific enzyme inhibitors
  • Structural biology plays key role in design/enhancement of these compounds.
  • Identify better inhibitors, understand specific differences and mechanisms.

Cancer Biotechnology Series

abcc bioinformatics support group
Anney Che

Jack Chen

Jin Chen

Qingrong Chen

David Liu

Uma Mudunuri

Jigui Shan

Wei Shao

Gary Smythers

Hong Mei Sun

Natalia Volfovsky

Xinyu Wen

Ming Yi

Jack Zhu

ABCC Bioinformatics Support Group

Cancer Biotechnology Series

bob stephens bobs@ncifcrf gov www abc ncifcrf gov

Query tool


Cancer Biotechnology Series