slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Nirav Merchant University of Arizona PowerPoint Presentation
Download Presentation
Nirav Merchant University of Arizona

Loading in 2 Seconds...

play fullscreen
1 / 25

Nirav Merchant University of Arizona - PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on

The iPlant Collaborative Cyberinfrastructure aka Development of Public Cyberinfrastructure to Support Plant Science. Nirav Merchant University of Arizona. PowerPoint Does Rocket Science--and Better Techniques for Technical Reports E ssay by Edward Tufte. What is iPlant?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Nirav Merchant University of Arizona' - kent


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

The iPlant CollaborativeCyberinfrastructureakaDevelopment of Public Cyberinfrastructure to Support Plant Science

Nirav Merchant

University of Arizona

slide2

PowerPoint Does Rocket Science--and Better Techniques for Technical Reports

Essay by Edward Tufte

what is iplant
What is iPlant?
  • iPlant’s mission is to build the CI to support plant biology’s Grand Challenge solutions
  • Grand Challenges were not defined in advance, but identified through engagement with the community
  • A virtual organization with Grand Challenge teams relying on national cyberinfrastructure
  • Long term focus on sustainable food supply, climate change, biofuels, ecological stability, etc
  • Hundreds of participants globally… Working group members at >50 US institutions, USDA, DOE, etc.
brief history
Brief History
  • Formally approved by National Science Board – 12/2007
  • Funding by NSF – February 1st, 2008
  • iPlant Kickoff Conference at CSHL – April 2008
    • ~200 participants
  • Grand Challenge Workshops – Sept-Dec 2008
  • CI workshop – Jan 2009
  • Grand Challenge White Paper Review – March 2009
  • Project Recommendations – March 2009
  • Project Kickoffs – May 2009 & August 2009
  • First Release of Discovery Environments – April 2010
the paradigm shift
The paradigm shift
  • Classic paradigm: You produce data, analyze, interpret (end to end)
  • Conventional paradigm: Consortium/centers produce data and you consume it
  • New Paradigm: Consortium/centers have produced data and creating “cyber infrastructure” to tackle the “grand challenge”
slide6

GC Projects Recommended by the iPlant Board of Directors March 2009Initial Projects:Plant Tree of Life – iPToL – May ‘09+Taxonomic Intelligence + APWeb2 + Social Networking WebsiteGenotype to Phenotype – iPG2P – Aug ‘09 + Image Analysis Platform

iplant tree of life working groups
iPlant Tree of Life Working Groups

Trait Evolution, Brian Omeara

Post-tree analysis and mapping of ancestral traits

Tree Reconciliation, Todd Vision

Large-scale reconciliation of gene trees, co-evolving parasites, etc., with species trees

Big Trees, Alexandros Stamatakis

HPC Phylogenetic inference with 500K taxa

Tree Visualization Michael Sanderson; Karen Cranston

Cross cutting group for the viz needs of all

Data Integration, Val Tannen, Bill Piel

Cross cutting group for the data integration needs of all

Data Assembly, Doug Soltis, Pam Soltis, Michael Donoghue

Community and network building, data assembly

iplant genotype to phenotype working groups
iPlant Genotype to Phenotype Working Groups

NextGen Sequencing

Establishing an informatics pipeline that will allow the plant community to process NextGen sequence data

Statistical Inference

Developing a platform using advanced computational approaches to statistically link genotype to phenotype

Modeling Tools

Developing a framework to support tools for the construction, simulation and analysis of computational models of plant function at various scales of resolution and fidelity

Visual Analytics

Generating, adapting, and integrating visualization tools capable of displaying diverse types of data from laboratory, field, in silico analyses and simulations

Data Integration

Investigating and applying methods for describing and unifying data sets into virtual systems that support iPG2P activities

what is cyberinfrastructure originally about teragrid
What is Cyberinfrastructure?(Originally about TeraGrid)

It was six men of Indostan,

To learning much inclined,

Who went to see the elephant,

(Though all of them were blind),

That each by observation

Might satisfy his mind.

WWW.TERAGRID.ORG

It’s a Grid!

It’s a Network!

They are HPC Centers!

It’s a Common Software Environ!

And More!:

- Viz

- Facilities

- Data collections

It’s Apps and Support!

It’s Storage!

the iplant cyberinfrastructure
The iPlant Cyberinfrastructure

User

iPlant Discovery Environments

Grand Challenge Workflows, iPlant Interfaces

Third Party Tools, iPlant-built Tools, Community Contributed Tools and Data!

iPlant Middleware

Job Submission Workflow Management Service/Data APIs

iRODS, Grid Technologies, Condor, RESTful Services

Compute

Storage

Persistent Virtual Machines

TeraGrid

Open Science Grid

UA/ASU/TACC

Physical Infrastructure

Build a CI that’s robust, leverages national infrastructure, and can grow through community contribution!

open source philosophy commercial quality process
Open Source Philosophy, Commercial Quality Process
  • iPlant is open in every sense of the word:
    • Open access to source
    • Open API to build a community of contributors
    • Open standards adopted wherever possible
    • Open access to data (where users so choose).
  • iPlant code design, implementation, and quality control will be based in best industrial practice
portfolio of activities
Portfolio of Activities
  • Maintaining a balance of “past, present, future” strategies
    • “Past”: make services, systems, and support available to existing bioinformatics projects, either to enhance them or simply make critical tools more widely available.
    • “Present” build the best bioinformatics software tools that today’s technologies can provide.
    • “Future” track emerging technologies, and where appropriate stimulate research into the creation and use of those technologies.
portfolio of activities1
Portfolio of Activities
  • In a nutshell:
    • 12 Working groups in the two grand challenges, each of which is defining requirements for DE development.
      • Each group not only has discussions that leads to final projects, but they also spawn prototyping efforts, tech eval projects, tool support projects, etc.
    • Services group: provide cycles, storage, hosting, etc. to users.
    • A comprehensive technology evaluation program to find, borrow, or build relevant technologies, headlined by the semantic web effort.
    • A number of ancillary projects related to grand challenges, i.e. APWEB, high throughput image analysis
    • The Core development/integration effort.
systems and services
Systems and Services
  • Provide access for problems like these on large scale systems
  • Provide the storage infrastructure for biological data (again, in support of existing projects)
  • Provide cloud style VM infrastructure for service hosting.
iplant connecting users ideas and resources
iPlant : Connecting Users, Ideas and Resources

The core foundation component comprises of :

  • Data layer
  • Registry and Integration layer
  • Compute and Analysis layer
  • Interaction and Collaboration layer
iplant using proven technologies
iPlant: Using proven technologies
  • Data layer:providing access to raw and ingested data sets including high throughput data transfers
      • iRODS
      • GridFTP , Aspera
      • Dspace (DuraSpace), OpenArchive initiative
      • Content Distribution Networks (CDN)
      • High performance storage @ TACC (Lustre)
      • MySQL and Postgres database clusters
      • Connection to established data sources (NCBI, TAIR, Gramene)
      • Connection to DataOne, DataNet initiatives
      • Cloud style storage (similar to Amazon S3 and Walrus)
iplant using proven technologies1
iPlant: Using proven technologies
  • Registry and Integration Layer:Connecting services, data and meta data elements with semantic understanding
      • Meta data catalog management
      • Provenance tracking (W7 model)
      • Integrated Registry and Service discovery servers
      • Data Client and Data Provider Ontology development Kit
      • Semantic Architecture (OWL based SSWAP)
iplant using proven technologies2
iPlant: Using proven technologies
  • Compute and Analysis Layer:Connecting tasks with scalable platforms and algorithms
      • Virtualization (Xen clusters)
      • High Performance Computing at TACC and TeraGrid
      • Grid (Condor, BOINC, Gearman)
      • Cloud (Eucalyptus, Nimbus)
      • Reconfigurable Hardware (GP GPU, FPGA)
      • Checkpoint & Restart (DMTCP)
      • Scaling and parallelizing code (MPI)
      • Workflow engines (DAGman, Pegasus, Kepler)
iplant using proven technologies3
iPlant: Using proven technologies
  • Interaction and Collaboration layer:Providing end user access to unified services and data, from API to large scale visualization
      • Google Web Toolkit (GWT driven front end)
      • Messaging bus (Java Mule,RabbitMQ, XMPP/Jabber)
      • RESTful web services (web API access)
      • Single sign-on/identity management (Shibboleth. Oauth ?)
      • Transparent HPC integration (TeraGrid science gateway and TACC resources
      • Integration with desktop applications (via web services)
      • Collaboration platforms (openmeeting, webexwiki, mailman)
      • Shared analysis (shared workflows, desktop view)
      • Sharing data (DOI, persistent URL, CDN, social networks)
      • Large scale visualization (Large Tree, Paraview, SAGE)
storage services
Storage Services
  • We have also begun offering storage to a number of projects connected to the grand challenges in some way, as well as iPlant internal.
    • IRODS interface
    • Corral at TACC, a local storage array at UA
  • Data arriving now for 1KP project, Gates C3/C4 project.
cloud services
Cloud Services
  • iPlant is now offering “cloud” style hosting services.
  • Dynamically launch virtual servers hosted by iPlant.
  • Still in prototype
slide22

SaaS: Software as a Service

(e.g. Clustering/Assembly is a service)

PaaS: Platform as a Service

IaaS plus core software capabilities on which you build SaaS

(e.g. Hadoop/MapReduce is a Platform)

Cyberinfrastructure

Is “Research as a Service”

IaaS: Infrastructure as a Service

(get computer time with a credit card and with a Web interface like EC2)

Arrival of “As a Service” models

http://salsahpc.indiana.edu

what do working groups want
What do working groups want ?
  • Wiki
  • Shared storage
  • WebEX
  • CMS
  • Google apps
  • Machine for prototyping/development
  • Change management s/w (git/svn)
  • Access to compute grid/cluster
what iplant wants
What iPlant wants
  • Ability to integrate single sign on (sso) with all services we offer (api, cloud, grid, irods etc)
  • Leverage credentials from users home institutions
  • Lower the barrier to access while still being secure
  • Emphasis on ease of access to “research as a service”
phases of a project
Phases of a project
  • Enthusiasm
  • Disillusionment
  • Panic
  • Search for the guilty
  • Punishment of the innocent
  • Praise and honor for the non-participants

Karla Jennings