Virtual organizations building interdisciplinary collaborations
Download
1 / 47

Virtual Organizations: Building Interdisciplinary Collaborations - PowerPoint PPT Presentation


  • 87 Views
  • Updated On :

Virtual Organizations: Building Interdisciplinary Collaborations. Dan Reed reed@renci.org Chancellor’s Eminent Professor Vice Chancellor for IT University of North Carolina at Chapel Hill Director, Renaissance Computing Institute. Acknowledgments. Funding agencies NIH

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Virtual Organizations: Building Interdisciplinary Collaborations' - gillian


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Virtual organizations building interdisciplinary collaborations l.jpg

Virtual Organizations: Building Interdisciplinary Collaborations

Dan Reed

reed@renci.org

Chancellor’s Eminent Professor

Vice Chancellor for IT

University of North Carolina at Chapel Hill

Director, Renaissance Computing Institute


Acknowledgments l.jpg
Acknowledgments Collaborations

  • Funding agencies

    • NIH

      • Carolina Center for Exploratory Genetic Analysis (CCEGA)

    • NSF

      • TeraGrid Science Gateways

    • State of North Carolina

      • RENCI and ancillary Bioportal support

  • RENCI staff

    • Alan Blatecky, Kevin Gamiel, Xiaojun Guan

    • Clark Jefferies, Howard Lander

    • John Magee, Ruth Marinshaw, Jeff Tilson

    • Lavanya Ramakrishnan

  • And a host of others …


21 st century challenges l.jpg
21 Collaborationsst Century Challenges

  • The three fold way

    • theory and scholarship

    • experiment and measurement

    • computation and analysis

  • Supported by

    • distributed, multidisciplinary teams

    • multimodal collaboration systems

    • distributed, large scale data sources

    • leading edge computing systems

    • distributed experimental facilities

  • Socialization and community

    • multidisciplinary groups

    • geographic distribution

    • new enabling technologies

    • creation of 21st century IT infrastructure

      • sustainable, multidisciplinary communities

  • “Come as you are” response

Computation

Experiment

Theory


Exemplar 21 st century challenges l.jpg
Exemplar 21 Collaborationsst Century Challenges

  • Population growth in sensitive areas

    • severe weather sensitivity

      • national impact

    • geobiology and environment

    • economics and finance

    • sociology and policy

  • Economics and health care

    • longitudinal public health data

      • environmental interactions

    • genetic susceptibility

      • heart disease, cancer, Alzheimer's

    • privacy and insurance

    • public policy and coordination


Mean onset of alzheimer s disease l.jpg
Mean Onset of Alzheimer’s Disease Collaborations

  • apolipoprotein (apo)

    • apoE2, apoE3 and apoE4 alleles

      • on chromosome 19

    • apoE4 allele

      • 40% to 60% of Alzheimer's patients

      • not the only cause for Alzheimer’s

  • apo gene inheritance

    • ~25% inherit 1 copy of apoE4 allele

      • Alzheimer's risk increases 4X

    • 2% inherit 2 copies of apoE4 allele

      • Alzheimer's risk increases 10X

1.0

2/3

0.8

2/4

0.6

3/3

Proportion of each

genotype unaffected

0.4

3/4

0.2

4/4

0

60 65 70 75 80 85

Age at onset

Source: Alan Roses, GSK


Big questions l.jpg

Protein Collaborations

structure

Protein/enzyme

function

TATA

Promoter

QYR

C

A

G

TAC

Message

Homology based

protein structure prediction

Molecular

simulations

CGT

Big Questions

Protein sequence and regulation

DNA

sequence

Sequence

Annotation

Data

integration

Network

analysis

Pathway

simulations

Multi-protein

machines

Organs, Organisms and Ecologies

Metabolic pathways

and regulatory networks

Bacteria and cells


Genetics and disease susceptibility l.jpg
Genetics and Disease Susceptibility Collaborations

Phenotype 1 Phenotype 2 Phenotype 3 Phenotype 4

Ethnicity

Environment

Age

Gender

Identify Genes

Pharmacokinetics

Metabolism

Endocrine

Biomarker

Signatures

Physiology

Proteome

Transcriptome

Immune

Morphometrics

Predictive Disease Susceptibility

Source: Terry Magnuson, UNC


Pitac report contents l.jpg
PITAC Report Contents Collaborations

  • Computational Science: Ensuring America’s Competitiveness

    • A Wake-up Call: The Challenges to U.S. Preeminence and Competitiveness

    • Medieval or Modern? Research and Education Structures for the 21st Century

    • Multi-decade Roadmap for Computational Science

    • Sustained Infrastructure for Discovery and Competitiveness

    • Research and Development Challenges

  • Two key appendices

    • Examples of Computational Science at Work

    • Computational Science Warnings – A Message Rarely Heeded

  • Available at www.nitrd.gov


Life science lessons from astronomy l.jpg
Life Science Lessons from Astronomy Collaborations

  • Historically, discoveries accrued to those

    • with access to unique data

    • who built next generation telescopes

  • Two things changed

    • growing costs and complexity of telescopes

    • emergence of whole sky surveys

  • The result – virtual astronomy

    • discovering significant patterns

      • analysis of rich image/catalog databases

    • understanding complex astrophysical systems

      • integrated data/large numerical simulations


Inter national virtual observatory l.jpg
{Inter}national Virtual Observatory Collaborations

3.

X-ray and Optical

Images retrieved

via SIA interface

Chandra SIA

NED Cone Search

Skyview SIA

CADC CNOC Cone Search

DSS SIA

5.

Initial Galaxy Catalog

generated via Cone Search

DSS SIA

CNOC SIA

Cluster Galaxy Morphology Analysis Portal

6.

Image cutout

pointers merged

into catalog

2.

Look up cluster

in internally stored

catalog

clusters

Morphology Calculation

Service

Morphological

parameters calculated

on grid for each galaxy

7.

User’s Machine

1.

User selects

a cluster

User downloads final

table and images for analysis & visualization

4.

User launches

distributed

analysis

8.

web

browser

Source: Ray Plante, NCSA


The bioinformatics challenge l.jpg
The Bioinformatics Challenge Collaborations

  • Challenge

    • the rise of quantitative biology

      • burgeoning bioinformatics data

    • complex analysis and modeling problems

    • education and training in new technologies

  • Reality

    • diverse tools with idiosyncratic interfaces

      • steep learning curves

    • software development by diverse groups

    • distributed, databases with diverse metadata

  • Need

    • integrated, easy-to-use toolset with standard interfaces

    • extensible mechanisms that hide idiosyncrasies

    • tool and bioinformatics training

  • The solution

    • bioinformatics infrastructure and coupled training


Need simple easy to use tools l.jpg
Need: Simple, Easy-To-Use Tools Collaborations

“Genome. Bought the book. Hard to read.”

Eric Lander


Web and social processes l.jpg
Web and Social Processes Collaborations

  • Google

    • it’s a search engine, it’s a verb, …

  • Blogs

    • published self-expression

  • Instant Messenger

    • social networks

  • Wireless messaging

    • semi-synchronous

  • Internet commerce

    • the dot.com boom/bust

    • EBay, Amazon

  • Spam, phishing, …

    • anti-social behavior


Benefits of standards l.jpg
Benefits of Standards Collaborations

  • Interoperability

  • Separation of concerns

  • Reuse

  • Independence

  • Dependability

  • Sharing

  • Commonality

  • Shared knowledge base

    • knowledge reuse

    • simplification (one hopes)


Grids of all flavors l.jpg
Grids of All Flavors Collaborations


What s a grid web service l.jpg

It’s been 12 years! Collaborations

What’s A Grid/Web Service?

http://

Web: Uniform access to documents

http://

Software

catalogs

Grid/Web Services:

Flexible, high-performance access to resources and services for distributed communities

Computers

Sensors and

instruments

Colleagues

Data archives


Grid history i way at sc 95 l.jpg
Grid History: I-Way at SC’95 Collaborations

  • A prototype national infrastructure

    • 17 sites, connected by

      • vBNS and six other ATM networks

    • 60 applications

  • Features

    • I-POPs for site access

    • Kerberos authentication

    • manual scheduling

    • distributed communication libraries

  • Experiences

    • led to Globus Grid toolkit

  • Concurrent industry needs

    • led to web services for B2B interoperation


Web services commercial grids l.jpg
Web Services: “Commercial Grids” Collaborations

  • From browser-centric to service-centric

    • from human-computer to computer-computer

    • structured negotiation and response

  • Workflow creation and management

    • end-to-end service negotiation

    • inter-organizational interaction

  • Prerequisites

    • metadata standard for service descriptions

    • standard communication mechanisms

    • resource discovery and registration


Ebay web services architecture l.jpg
eBay Web Services Architecture Collaborations

  • Over 40% of eBay's listings are now via API calls

Source: IBM


Web services a definition l.jpg

Invoke Collaborations

Locate

Publish

Service

Consumer

Service

Provider

Service

Broker

Web Services: A Definition

A web service is … designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact … [using] its description using SOAP-messages, … using HTTP with an XML serialization ....

W3C Working Draft, August 2003

SOAP

SOAP

WSDL

UDDI

SOAP

  • SOAP (Simple Object Access Protocol)

  • WSDL (Web Services Description Language)

  • UDDI (Universal Description, Discovery and Integration)


Technology push l.jpg
Technology Push Collaborations

Source: Gartner Group


European mygrid architecture l.jpg
European myGrid Architecture Collaborations

Source: www.mygrid.org


The bioinformatics challenges l.jpg
The Bioinformatics Challenges Collaborations

  • Complex, multilevel models

    • integration and in silico designs

  • Information visualization

    • complexity and scale

  • Data models and ontologies

    • community definition

  • Data federation, storage and management

    • shared access and support

  • User access portals

    • web-based tool and service interfaces

  • Packaging, distribution and deployment

    • community building


Multilevel cellular models l.jpg
Multilevel Cellular Models Collaborations

  • Signaling networks

    • environmental triggers and behavior

      • e.g., cell lifecycle

    • different pathways in each tissue type

  • Metabolic networks

    • measurable products in pathway

    • many systems are steady state

    • negative feedback leads to stabilization

  • Protein interaction networks

    • localization of proteins that interact for function

    • protein-protein interactions for specific actions

  • Gene regulatory networks

    • many things affect gene product concentration

    • nucleic-nucleic, protein-nucleic interactions

  • Computing, physics, engineering and biology

    • control theory, mathematical models, phase spaces

    • from biological cartoons to predictive models

      • e.g., microRNAs and gene expression controls


Biological models l.jpg

Simulation and prediction Collaborations

structures and dynamics

Reasoning and discovery

reverse engineering

10-12

10-9

10-6

10-3

100

103

106

Bond Motion

Catalysis

Growth &

Division

Diffusion

Transcription

Translation

100

102

104

106

108

1010

1012

Metabolites

Proteins

Ribosomes

Prokaryotes

Eukaryotes

Biological Models

Temporal (seconds)

Spatial (nM3)


Biophysical and environmental modeling l.jpg
Biophysical and Environmental Modeling Collaborations

Airway/flow

Mucus

Disease, Environment and Medicine

Cilia

Cell biochemistry

and structure

Proteomics

Genomics

Source: Ric Boucher, UNC


Data heterogeneity and complexity l.jpg

Disease Collaborations

Gene sequence

Phenotype

Clinical trial

Genome sequence

Gene expression

Disease

Gene expression

Drug

Protein

Disease

Protein Structure

Disease

homology

Protein Sequence

P-P interactions

Data Heterogeneity and Complexity

Genomic, proteomic, transcriptomic, metabalomic, protein-protein interactions, regulatory bio-networks, alignments, disease, patterns and motifs, protein structure, protein classifications, specialist proteins (enzymes, receptors), …

Proteome

Source: Carole Goble (Manchester)


Sensor data overload l.jpg
Sensor Data Overload Collaborations

Source: Chris Johnson, Utah

Art Toga, UCLA

Source: Robert Morris, IBM

  • High resolution brain imaging

    • 4.5 petabytes (PB) per brain


Renci what is it l.jpg
RENCI: What Is It? Collaborations

  • Statewide objectives

    • create broad benefit in a competitive world

    • engage industry, academia, government and citizens

  • Four target areas

    • public benefit

      • supporting urban planning, disaster response, …

    • economic development

      • helping companies and people with innovative ideas

    • research engagement across disciplines

      • catalyzing new projects and increasing success

      • building multidisciplinary partnerships

    • education and outreach

      • providing hands on experiences and broadening participation

  • Mechanisms and approaches

    • partnerships and collaborations

    • infrastructure as needed to accomplish goals


Carolina center for exploratory genetic analysis ccega l.jpg
Carolina Center for Exploratory Genetic Analysis (CCEGA) Collaborations

Interoperable

Data

Management

Faculty, Staff & Students

Driving Problems

Promoting

Mutual

Awareness

Experimental

Genetics Portal

Analysis Techniques

Statistical &

Computational

Techniques

Extant Data Models

Virtuous Cycle

Interdisciplinary

Research & Education


Ccega participants l.jpg

Coordination team Collaborations

Dan Reed, RENCI

Terry Magnuson, CCGS

Alan Blatecky, RENCI

Kirk Wilhelmsen, CCGS

Eleven departments/institutes

Biostatistics

Cancer Center

Genetics

Computer Science

Epidemiology

Genetics

Health Science Library

Information and Library Science

Pharmacy

RENCI

Statistics

Campus wide support

from many sources

Project participants

Brad Hemminger, Information & Library Science

James Evans, Genetics

Kevin Gamiel, RENCI

Xiaojun Guan, RENCI

Barrie Hays, Health Science Library

Clark Jefferies, RENCI

Ethan Lange, Genetics

Andrew Nobel, Statistics

Karen Mohlke, Genetics

Kari North, Epidemiology

Susan Paulsen, Computer Science

Fernando Manuel Pardo, Genetics

Charles Perou, Cancer Center

Lavanya Ramakrishnan, RENCI

Jan Prins, Computer Science

Patrick Sullivan, Genetics

Lisa Susswein, Cancer Center

David Threadgill, Genetics

Alexander Tropsha, Pharmacy

K.T.L. Vaughan, Health Science Library

Fred Wright, Biostatistics

Wei Wang, Computer Science

Fei Zou, Biostatistics

CCEGA Participants


Data from lab and clinic to analysis l.jpg

Independent data management Collaborations

data security

version control

redundancy

controlled access

Data: From Lab and Clinic to Analysis

ELSI

Clinical

ELSI

Analysis

Analysis

Laboratory

Integration &

Informatics

LAB

Clinic

Analysis

  • NIH CCEGA

    • Carolina Center for Exploratory Genetic Analysis

Source: Brad Hemmenger, UNC


Data management and information viz l.jpg

GenBank Collaborations

Data Management and Information Viz

Published Domain Literature

Taxonomy

Annotation

Ontology

Annotation

…..

DB Schema Ontology Annotation

Annotated Domain Literature

Information Mining

Module

Information Visualization

Module


From snps to hapmap l.jpg
From SNPs to HapMap Collaborations

  • Single Nucleotide Polymorphisms (SNPs)

    • one in ~1200 bases differ across individuals

    • SNPs act as markers to locate genes

  • Common groups of SNPs are shared

    • i.e., form a haplotype

  • HapMap data sources

    • 90 Yoruba individuals (30 trios) from Nigeria (YRI)

    • 90 individuals (30 trios) of European descent from Utah (CEU)

    • 45 Han Chinese individuals from Beijing (CHB)

    • 45 Japanese individuals from Tokyo (JPT)

  • ~3,500,000 SNPs typed

    • basis for association studies for disease identification


Ccega hapmap simulator l.jpg

Synthetic data Collaborations

disease models

model testing

mining bakeoffs

CCEGA HapMap Simulator


Carolina bioportal l.jpg
Carolina Bioportal Collaborations

  • Three overlapping target groups

    • undergraduate education

    • graduate education and research

    • academic/industrial research

  • Features

    • access to common bioinformatics tools

    • extensible toolkit and infrastructure

      • OGCE and National Middleware Initiative (NMI)

      • leverages emerging international standards

    • remotely accessible or locally deployable

    • packaged and distributed with documentation

  • National reach and community

    • TeraGrid deployment

      • science gateway

  • Education and training

    • hands-on workshops

      • clusters, Grids, portals and bioinformatics


Distributed grid and web services l.jpg

Workflow service

App Instance

App Instance

App Instance

Open Grid Service Architecture Layer

Data Management

Service

Registries and

Name binding

Security

Policy

Logging

Accounting

Service

Administration

& Monitoring

Reservations

And Scheduling

Grid Orchestration

Event/Message

Service

Resource Layer

(from PCs to Supercomputers)

Distributed Grid and Web Services

Launch, configure

and control

Grid Portals

Open Grid Service Infrastructure (web service component model)

Online instruments

Source: Dennis Gannon, Indiana


Bioportal architecture l.jpg
Bioportal Architecture Collaborations

Bioportal

Interface

Generator

HTML Files

PISE

Application

XML

Description

Application

Processing

  • www.ncbioportal.org

Velocity

Files

User Profile

Job Submission

Remote

File

Access

Job

Records

Authentication,

Grid Credential

Application

Databases

Command

Files

Job History

Database

Application

Processing

OGCE User

Databases

MyProxy

GridFTP

Gatekeeper

Local

cluster

  • OGCE toolkit

    • used by cyberinfrastructure projects

      • LEAD, NEES, PACI, DOE, TeraGrid …


Putting the technologies together l.jpg
Putting the Technologies Together Collaborations

NC Bioportal

OGCE Toolkit (Grid middleware)

PISE

(XML

Wrapper)

Tomcat

(Apache

servlet

container)

Chef (collaboration/standard portlets)

Jakarta Jetspeed

(enterprise portal)

Bio

Applications

Turbine

(web app

framework)

Velocity

(template

engine)

Grid

Portlets,

CoG

VMC

Databases


Community software toolkit lessons l.jpg
Community Software Toolkit: Lessons Collaborations

  • NSF PACI Alliance “In a Box” toolkits

    • cluster software (aka OSCAR)

    • Grid infrastructure (aka NMI)

    • Access Grid for distributed collaboration

    • tiled display walls for visualization

  • Distribution materials

    • software and training materials

      • CDs and web

  • Community workshops and training

    • Linux Clusters Institute

    • MSI HPC workshops

    • hands on training

  • Lowering the entry barrier

    • usage and deployment

  • Bioportal distribution

    • workshops, tutorials

    • training materials

    • road shows

Bioportal Distribution


Nc bioportal what s next l.jpg
NC Bioportal: What’s Next Collaborations

  • Engagement

    • workshops, experiences and deployments

  • Infrastructure

    • dynamic job scheduling across multiple sites

    • migration to OGCE 2.0

    • fully automated database updates

    • workflow construction and processing

  • Portal tool suite

    • expanded applications and databases

      • phylogeny, morphology, microarray analysis, …

  • Training materials

    • additional modules based on user feedback

    • workshop materials packaged for self-study

  • Leverage national presence

    • TeraGrid/NCSA bioinformatics portal


The vision of grid web services l.jpg
The Vision of Grid/Web Services Collaborations

“… Behold, the people is one, and they have all one language; and this they begin to do: and now nothing will be restrained from them, which they have imagined to do.”

  • Book of Genesis

Peter Bruegel

The Tower of Babel (1563)

We're Not There Yet ...


Interdisciplinary collaborations l.jpg
Interdisciplinary Collaborations Collaborations

  • Appropriate reward structures

    • well-matched time constants

  • Intellectual equality

    • balanced recognition of contributions

  • Research/infrastructure distinctions

    • timelines and people needs differ

  • Confidentiality and openness

    • academic/industry collaboration perspectives

  • Intellectual property

    • background IP and differential disciplinary models


Some thoughts on the future l.jpg
Some Thoughts on the Future Collaborations

  • Grids/web services are not a panacea

    • we have seen this movie before

      • standards debates can be endless

      • make new mistakes, not the same old ones

    • code is shifted from modules to interfaces

  • Danger of “Death by CS Abstraction”

    • “all problems can be solved by another level of indirection”

  • Appropriate decomposition is a challenge

    • performance, usability, flexibility

  • Generality and extensibility really matter

    • incremental aggregation and interoperability

    • data management and federation

  • Better questions, not just private capabilities

    • limited by creativity not resources


The cambrian explosion l.jpg
The Cambrian Explosion Collaborations

  • Most phyla appear

    • sponges, archaeocyathids, brachiopods

    • trilobites, primitive mollusks, echinoderms

  • Indeed, most appeared quickly!

    • Tommotian and Atdbanian

    • as little as five million years

  • Lessons for computing

    • it doesn’t take long when conditions are right

      • raw materials and environment

    • leave fossil records if you want to be remembered!