1 / 44

From Bio-Informatics towards e-BioScience

From Bio-Informatics towards e-BioScience. L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van Amsterdam bob@science.uva.nl. Background information experimental sciences. There is a tendency to look ever deeper in:

ita
Download Presentation

From Bio-Informatics towards e-BioScience

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Bio-Informatics towards e-BioScience L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems GroupDepartment of Computer ScienceUniversiteit van Amsterdam bob@science.uva.nl

  2. Background informationexperimental sciences • There is a tendency to look ever deeper in: • Matter e.g. Physics • Universe e.g. Astronomy • Life e.g. Life sciences • Instrumental consequences are increase in detector: • Resolution & sensitivity • Automation & robotization • Therefore experiments change in nature & become increasingly more complex

  3. Impact in the life sciences • Impact of high throughput methods e.g. Omics experimentation • genome ===> genomics

  4. DNA Genomics RNA Transcriptomics protein Proteomics metabolites Metabolomics New technologies in Life Sciences research Methodology/ Technology cell University of Amsterdam

  5. Omics impact

  6. Impact in the life sciences • Impact of high throughput methods e.g. Omics experimentation • genome ===> genomics • Instrumentation being used in omics experimentation: • Transcriptomics via among others; micro-arrays • Proteomics via among others; Mass Spectroscopy (MS) • Metabolomics via among others; MS & Nuclear Magnetic Resonance (NMR)

  7. Results in Paradigm shift in Life sciences • Past experiments where hypothesis driven • Evaluate hypothesis • Complement existing knowledge • Present experiments are data driven • Discover knowledge from large amounts of data

  8. DNA Life sciences research: from gene to function nucleus cell Gene Whole-genome sequence projects Gene expression by RNA synthesis Genome-wide micro-array analysis mRNA AAAAAAAAA mRNA translation by protein synthesis NH2 “High-throughput” protein-analysis Protein COOH Protein function: -prediction by bioinformatics -proof by laboratory research function-1 function-n function-2

  9. Developments towards Bio-informatics & e-Science • Experiments become increasingly more complex • Driven by increase of detector developments • Results in an increase inamount and complexity of data • Something has to be done to harness this development • Bio-informatics to translate data into useful biological, medical, pharmaceutical & agricultural knowledge

  10. The what of Bioinformatics Bioinformatics is redefining rules and scientific approaches, resulting in the ‘new biology’. Within this new paradigm the traditional scientific boundaries are blurred, leaving no clear line between ‘dry or computational’ and ‘wet-based’ approaches

  11. DNA Genomics Data generation/validation Data integration/fusion Data usage/user interfacing RNA Transcriptomics protein Proteomics metabolites Metabolomics Integrative/System Biology Role of bioinformatics Bioinformatics methodology cell

  12. Two sides of Bioinformatics • The scientific responsibility to develop the underlying computational concepts and models to convert complex biological data into useful biological and chemical knowledge • Technological responsibility to manage and integrate huge amounts of heterogeneous data sources from high throughput experimentation • Need for e-Science support

  13. Developments towards Bio-informatics & e-Science • Experiments become increasingly more complex • Driven by increase of detector developments • Results in an increase inamount and complexity of data • Something has to be done to harness this development • Bio-informatics to translate data into useful biological, medical, pharmaceutical & agricultural knowledge • Virtualization of experimental resources enabling sharing & leading to e-BioScience

  14. Life science application areas Life science/genomics research consortia and industry e-Bioscience and life science innovation domain Bioinformatics e-Bioscience & research infrastructure Generic e-Science ICT development and support e-Science & research infrastructure Grid infrastructure Network infrastructure and computing capacity

  15. Why e-BioScience • There is an increasing necessity to use results from other scientist e.g. share data & information:

  16. Re-use and sharing of biological data (2) Information content of omics data extremely high, however, • Data subject to noise, biological and technical variation • How to induce biological principles from these genome-wide data sets? Approach: develop methodology for “reverse engineering” of biological mechanisms. • Biggest challenge in bioinformatics today. Need for external data sources for in-silico experimentation • Two practices for re-use and sharing of data • Collectively compile huge amounts of relevant data and make these available to the community. Examples: Bio-banking, compendia (e.g. NIH’s Affymetrix SNP repository). • Re-use information from different and diverse experiments to discover phenomena

  17. Re-use and sharing of biological data (2) Compendium example: re-use and sharing of Huntington data • Datasets: 404 Affymetrix Gene chips of measurements on extremely rare human brain samples (Hodges et al. Hum. Mol. Genetics, 2006) • Available from NCBI GEO database (MIAME) • Goal: find genes involved in Huntington’s Disease • Approach: • Reanalyze gene expression data • Combine genotype data and clinical data (e.g. using SigWin) • Extend experiments with own ChIP on chip data

  18. Resource Identification software • Repository of relevant meta-information from: • Data warehouses e.g. GEO, ArrayExpress, Protein Interaction database • Literature (Mining of PubMed using Collexis) • Information resources specialized on diseases, genes, proteins, e.g. OMIM, GenBank, Ensembl

  19. Why e-BioScience • There is an increasing necessity to use results from other scientist e.g. share data & information: • Data repositories • Cohort studies in • Bio-banking • Biodiversity • Expensive and complex equipment • Mass Spectroscopy • MRI • Other

  20. Problems for the realization of e-BioScience • Life Science field is still in an early stage of development and: • First principles are not understood at all • As a consequence experimental methods are not well established and will not for a time to come • Because of the new forms of omics instrumentation there is a need for design for experimentation methods • Lack correct logging of conditions under which experiments are done • is production of large amounts of data that request among others statistical techniques for interpretation • As a consequence results are multi interpretable

  21. Problems for the realization of e-BioScience • Problems for bioinformatics & e-Bioscience: • Rationalisation at this early stage is almost impossible • Pre- standardization & standardization almost non existent • Where there are standards they are inadequate because multi interpretable (like MIAME for micro-array’s) • In addition there are commercial end-user products that are difficult to integrate • Users lack the training necessary to handle these complex experimental situation • Only possible solution is to create a flexible experimentation environment for the end-users

  22. Role of ICT in e-BioScience • e-Science is a new form of science methodology complementing theoretical and experimental sciences. • It is using generic methods and an ICT infrastructure to support this methodology. • Web services as a paradigm/way of using/accessing information • Grid is as a method of accessing & sharing computing resources by virtualization • What is missing in e-BioScience: • Connection between biological problem & e-Bioscience • User oriented tools that can be re-used and extended • General model of ICT based integration • Semantic support • ontology’s and semantic support for workflows to make user knowledge explicit

  23. Consequences for bio-informatics & e-BioScience • Considerable amounts of experimentation is necessary before a well established methodology will emerge • The VL-e approach might be a good model & produces an environment in which the necessary experimentation can be realized

  24. Basic model of problem area Small integration experiments + integration methods Readily accessible data + models data mining Easyvisua-lization Vague results e-BioOperator Biologists Biologists e-BioScientist Enhancing the scientific process: e-BioLab Motivation: • Interacting with the problem domain requires an environment in which the domain can be opened up and ideas, hunches and notions on the data and crude models of the biology can be visualized • A tangible space in which biologists, aided by e-scientists, will have the full potential of VL-e at their disposal. An actual laboratory in which: • Problem domain experts (biologists, medical doctors) and scientists from enabling disciplines jointly and in a creative manner work on the analyses and design of –omics experiments. • Problem domain experts can focus on the biology because they are shielded from technical details by e-scientists. • Viewpoints on the research question and the data semi-instantaneously can be expressed and visualized. • Ideas and analyses can be retainedand documented. • Facilities for remote collaboration are present*. Basic concept of e-BioLab: * Rauwerda et al., 2nd IEEE International Conference on e-Science and Grid Computing (submitted)

  25. Enhancing the scientific process: e-BioLab (2) Realization: • Large high resolution display (26.2 Mpixel) with high bandwidth (10 Gbit/s) connection to render cluster • Full access to computational facilities and GRID middleware of VL-e • e-whiteboards and tablet PCs to share and store ideas • High definition video cameras for remote collaboration • Highly adaptable lab configuration. Research into: • Problem Solving Environments for biology under study • formulation of scientific workflows that allow for sufficient interactivity and guarantee reproducibility • Maintaining an electronic lab journal for e-science experimentation • Methods for: • Information Management of omics data • Biological Domain Interaction / Resource Identification • Modeling of Biological Information and Knowledge • Remote scientific co-operation • Man-machine interaction

  26. Video remote collaboration Remote whiteboard Gene lists 3 2 Clustering 1 Literature Mining SOM GSEA Interesting Pathways GO catagories 2 1 3 High resolution displays in e-bioscience • Example: concurrently display in a discussion with a remote partner • Clustering results of microarray experiments • Interesting pathways that are predominant in certain clusters • Gene Ontology categories • Results from literature mining • Gene Set Enrichment of categories identified in literature mining • Notions depicted on the e-whiteboards

  27. Virtual Lab for e-Science research Philosophy • Multidisciplinary research and development of related ICT infrastructure • Generic application support • Application cases are drivers for computer & computational science and engineering research • Problem solving partly generic and partly specific • Re-use of components via generic solutions whenever possible

  28. Application pull Technology push Microarray pipeline Mass spectroscopy pipeline Domain Specific tools Domain generic e-BioScience services Pathway visualization Protein annotation Generic e-Science services Generic e-Science services Generic e-Science services Grid Services Harness multi-domain distributed resources

  29. Application pull Technology push Micro-array Transcriptomics pipeline Mass spectroscopy Proteomics pipeline Domain Specific tools Domain generic e-Science services Domain generic e-Science services Domain Generic services Generic e-Science services Generic e-Science services Generic e-Science services Grid Services Harness multi-domain distributed resources

  30. Bioinformatics methods in VL-e (1) Example 1 – An application specific method modified by e-science into a generic one: SigWin* • Starting point:Application specific method for detecting windows of increased gene expression on chromosomes** (implemented in C and perl for SAGE technology) • Motivation:Broad interest from molecular biology in positional behaviour of any measurement data that can be mapped onto DNA sequences • SigWin e-Science version:GRID-based modular workflow for detecting windows of significance in any sequence of values • Widely applicable from gene expression to meteorology data • Modules reusable for alternative workflows, e.g. protein modification • Scalable to very large datasets * Inda et al., 2nd IEEE International Conference on e-Science and Grid Computing (submitted) ** Versteeg et al, Genome Research, 2003

  31. Bioinformatics methods: SigWin Human gene expression DNA curvature of the Escherichia coli chromosome Significant window detector Generalisation of RIDGE method Temperature in Amsterdam

  32. Bioinformatics methods in VL-e (2) Example 2 – An application specific method composed of generic and specific modules in a workflow: OligoRAP* • Purpose: a re-annotation workflow for oligo libraries • Motivation: rapidly evolving knowledge in genome analysis requires frequent re-assessment of the molecules which are used to measure gene-expression. • OligoRAP • Uses set of application generic (BIOMOBY) BLAT and BLAST sequence alignment (web)services. • Uses application specific (BIOMOBY) annotation analysis service • BIOMOBY: de-facto standard for bio-informatics webservices. • Joint work of sequence analysis lab and micro-array lab • Workflow: • Adjustable filtering criteria make quality level of oligos explicit • Workflow provenance makes re-annotation reproducible. * P. Neerincx, H. Rauwerda, F. Verster, A. Kommadath, T.M. Breit, J.A.M. Leunissen, Poster ISMB 2006

  33. Virtual Lab for e-Science research Philosophy • Multidisciplinary research and development of related ICT infrastructure • Generic application support • Application cases are drivers for computer & computational science and engineering research • Problem solving partlygeneric and partly specific • Re-use of components via generic solutions whenever possible • Rationalization of experimental process • Reproducible & comparable • Two research experimentation environments • Proof of concept for application experimentation • Rapid prototyping for computer & computational science experimentation

  34. Partners: Universiteit van Amsterdam (UvA) Academisch Medisch Centrum (AMC) Vrije Universiteit Medisch Centrum (VUMC) Philips Research Philips Medical Systems TU Delft IBM Objective:To study the design and implementation of a PSE for medical diagnosis and imaging to support and enhance the clinical diagnostic and therapeutic decision process. Medical Diagnosis and ImagingProblem Solving Environment Applications: • Eddy current reduction • Matched Masked Bone Elimination • Functional brain imaging, DWI and fiber tracking • MR virtual colonoscopy • Parallel MEG data analyses • Grid-based data storage, retrieval and sharing • Interactive 3D medical visualization 1 4 3 5 7

  35. Brain Imaging and Fiber Tractography • Diffusion Weighted Imaging (DWI) • Restricted Brownian motion results in anisotropy that can be measured • >= 6 measurements, reduced to tensor per voxel • Largest eigenvectors give diffusion vector • Whole volume fiber tracking can takemany hours • Depends on size of volume and numberof measurements per voxel • Suitable for parallelization • Visualization techniques

  36. MedicalApplications … … Virtual Laboratory Grid Middleware Surfnet VL-e Environment Medical Diagnosis and ImagingProblem Solving Environment Application specific services: • Access to PACS, DICOM • Interfaces to medical scanners (MRI) • In-house developed algorithms: • Eddy Current Reduction • Matched Masked Bone Elimination • Patient privacy VL-e generic services: • Provides: • Scientific visualization techniques • Image processing algorithms • Uses: • Experiment editor • Parallel processing techniques Grid services: • Storage facilities (SRB) • High Performance Computing platforms • High Performance Visualization platforms

  37. Eddy current reduction • Shear, magnification and translation as a result of residual currents in DWI • 2D matching to correct • Computationally expensive • Parallelization throughdomain decomposition • Computing cycles via Grid • Integrated PACS solution Effects of residual eddy currents on Philips 3T Intera with DWI.Figure by Erik-Jan Vlieger, AMC.

  38. 2D/3D visualization VL experiment topology Image processing,Data storage Medical Diagnosis and ImagingProblem Solving Environment Data retrieval,acquisition Filtering, analyses,simulation

  39. The situation in the Netherlands • Netherlands Bio-Informatics Center (NBIC) was set up as part of the Dutch Genomics Initiative Netherlands Genomics Initiative (NGI) • Its aim was to organize bio-informatics in the Netherlands and to generate sufficient critical mass also to support as a technology center the other genomics initiatives • Organizational structure: • Board of directors • Dr van Kampen scientific director • Drs R. Kok executive director • Prof. Dr. Hertzberger adjunct scientific director • Board of overseeing • International Advisory board • Scientific Committee • Program Steering Group

  40. Current NBIC activities • Currently NBIC runs three programs and took the initiative and participates in another three joint activities besides collaboration such as with SURF (networking) and VL-e (e-Science): • NBIC programs: • BioRange: a bio-informatics research program of 25 M$ & 25 M$ matching • BioAssist: a 10 M$ support program • BioWise: a 3 M$ education program • Participation in : • Computation life sciences: a 5 M$ program with among others physics, chemistry and computational science • Pilot grid roll out: a 3M$ Grid rollout & support with Dutch Foundation for computing (NCF) and others • BIG GRID: a 35M$ GRID and e-Science program in the Netherlands together with NCF, physics, VL-e and others

  41. Program activities • Bio Range has four program lines: • Micro array related bio-informatics • Proteomics related bio-informatics • Integrated bio-informatics • Informatics research for Bio-informatics • All program lines comprise a number of collaborative projects with participation of groups all over the Netherlands • Bio Assist runs two program lines • Establishment of e-bioscience support environment • Establishment of generic e-science infrastructure • In future also addition towards biomedical as was illustrated

  42. The VL-e infrastructure Application specific service Medical Application Telescience Bio Informatics Applications Application Potential Generic service & Virtual Lab. services Virtual Lab. rapid prototyping (interactive simulation) Test & Cert. VL-software Virtual Laboratory Additional Grid Services (OGSA services) Test & Cert. Grid Middleware Grid Middleware Grid & Network Services Network Service (lambda networking) Test & Cert. Compatibility Surfnet VL-e Experimental Environment VL-e Certification Environment VL-e Proof of Concept Environment

  43. Total 25M$ support + 25M$ matching Total 35 M$ support Bio Applications xxxx Application feedback xxxx Medical Application Telescience BioAssist Rapid prototyping (interactive simulation) Virtual Laboratory Virtual Laboratory Additional Grid Services (OGSA services) Grid Middleware Grid Middleware Network Service (lambda networking) Surfnet Surfnet VL-E Proof of concept Environment VL-E Experimental Environment Big Grid Unstable Application & VL-e component e-Science Roll out Stable Application & VL-e component

  44. Conclusions • Omics experiments change the face of life sciences • Bioinformatics can be considered to be an essential enabler and is a form of e-Science • Will help to realize necessary paradigm shift in Life Science experimentation • Better support of experimentation & optimal use of ICT infrastructure requires rationalization experimentation process • Information management essential technology • Bioinformatics can not be decoupled from e-Bio-science applications • e-Bioscience also has to comprise biomedical applications

More Related